


Vol. 88, Parts 1 and 2 : | June 1951 


a 


BIOMETRIKA. 


FOUNDED BY 


W. F. R. WELDON, FRANCIS GALTON anp KARL PEARSON 


MANAGING EDITOR 


iE. S. PEARSON 


ASSOCIATE EDITORS 
M. G. KENDALL JOHN WISHART 


in consultation with 


HARALD CRAMER R. C, GEAF.Y 
J. B.S. HALDANE 


Reprinted by offset-litho, 1960 


ISSUED BY 
THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 


PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


[Issued 12 June 1951] 





cg 
<= 
oo 
=o) 
— 

















VOLUME 38, ParRTs 1 AND 2 JUNE 1951 





MAJOR GREENWOOD 
1880-1949 


Major Greenwood, Professor Emeritus of Epidemiology and Vital Statistics in the University 
of London, died suddenly at the age of 69 on 5 October 1949 while attending a scientific 
meeting. Thus was lost to the medical profession one of its outstanding and most influential 
specialists and teachers. An only child, he was born in 1880 in the East End of London 
where his father was in family practice. He was educated on the classical side at Merchant 
Taylors’ School and subsequently, in 1898, with a Buxton scholarship, studied medicine 
in University College and the London Hospital, gaining the conjoint qualification in 1904. 
He first acted as assistant to his father, but clearly with little interest in this side of the 
work for, within a matter of months, he forsook clinical medicine for good and, appointed 
demonstrator in physiology, he began a happy association with Leonard Hill in the London 
Hospital Medical College. Here, in addition to physiological research, mainly on the effects 
of pressure changes in man and animals (on which in 1906-7 he published alone or with 
Hill no less than eight papers, and devoted his Arris and Gale Lectures (1908) to the same 
topic), he had already revealed his interest in the special field to which his subsequent 
professional life was mainly devoted. His first published work, in the year of qualification, 
was a statistical study of the variability and interrelationships of human visceral weights 
appearing in this journal in 1904. Others in this early period related to statistical aspects of 
the opsonic index and the problem of marital infection in pulmonary tuberculosis. After 
a period of study under Karl Pearson, he was in 1910 (the year in which his first book, on 
the Physiology of the Special Senses, appeared) appointed statistician to the Lister Institute. 
Here he carried out statistical investigations into such diverse topics as the fatality of 
fractures and pneumonia in hospital practice, the epidemiology of plague, factors influencing 
rates of infant mortality, tuberculosis and cancer and, an early indication of his ultimate 
chief field of effort, an examination of factors determining the rise and spread of epidemic 
diseases. In the first World War he served for a time with the rank of Captain in the 
R.A.M.C. until 1917, when he was put in charge of the medical research subsection of the 
Ministry of Munitions. Here, naturally, his statistical energies were switched to the special 
problems of the hour, such as dietaries and energy expenditure in relation to munition 
workers, the incidence of tuberculosis in industry, the factors influencing labour wastage 
and sickness and accident frequency, but in the more general field he contributed to the 
study of problems associated with cerebro-spinal fever and influenza. In this period began 
his associations with Yule, Collis and May Smith on the industrial problems of accidents, 
sickness and fatigue. His book, with Collis, on the Health of the Industrial Worker, appeared 
in 1921, and in 1935, in the Heath Clark lectures, he returned to the discussion of the 
influence of environmental factors in industry. 

In 1919 he was put in charge of the medical statistics section of the newly created 
Ministry of Health, a post which he retained until 1928, the section being accommodated in 
the National Institute for Medical Research at Hampstead. Here, apart from the routine 
epidemiological and other statistical work for the Ministry, much of which appeared in 


unsigned contributions to official reports, the diversity and intensity of his labours are 
Biometrika 38 3 











2 Masor GREENWOOD 


reflected in over sixty individual or collaborate papers. In the earlier half of these ten 
years, in collaboration first with Woods and later with Yule, appeared his first writings on 
the frequency of multiple accidents and disease, work which led to so much fruitful sub- 
sequent investigation by others and which ranks among his most important contributions 
to statistical medicine. At this time, too, his deep interest in the historical aspects of 
epidemiology became apparent in his dissertations on the contributions made in this field 
by Sydenham and Galen. Among others of his researches at this period must be mentioned 
those on malignant disease, especially on the statistical aspects of the evaluation of different 
methods of therapy. Apart from their direct contributions to knowledge of the subject, 
these revealed the inadequacies of existing, and the potentialities of properly kept hospital 
records in throwing light on medical problems especially of disease description and thera- 
peutic assessment. In the wider field of routine medical statistics he produced at this time, 
with Granville Edge and on behalf of the League of Nations Health Organization, a series 
of booklets surveying the nature and content of the official vital statistics available in many 
European countries, designed to guide the unwary in the treacherous field of international 
comparisons. In the later part of his stay at the National Institute appeared the first of the 
series of joint papers with Topley on experimental epidemiology. The difficulties of human 
epidemiology might, it was thought, be lessened in study of populations of mice under 
strict control (‘a little epidemiological world of our own nearer to the heart’s desire’), the 
language of epidemiology might be more readily learned and the findings utilized in the 
interpretation of the more complicated happenings in man. Although fifteen years of 
constant questioning failed to produce all the answers, it did sensibly clarify many aspects 
of epidemic occurrences and revealed much of practical value, for example, on the limitation 
of protection by inoculation. This, perhaps the main single work of his life, was continued 
after his appointment in 1928 to the chair of Vital Statistics and Epidemiology at the 
London School of Hygiene and Tropical Medicine, where he remained until his retirement 
in 1945. The work is described in his Herter lectures in 1931, in Epidemics and Crowd 
Diseases (1935), in Experimental Epidemiology, M.R.C. Report No. 209 (1936) and in 
a ‘Statistical study of infectious diseases’ in the Journal of the Royal Statistical Society 
(1946). The switching of interest in recent years to the new therapeutic and prophylactic 
procedures in infectious diseases, and the pronounced success attendant on these new methods 
in reducing incidence and mortality has temporarily overshadowed the importance of this 
brilliant and painstaking teamwork. Following the death of Brownlee, Greenwood under- 
took the direction of the statistical staff of the M.R.C. and, on its formation, became 
chairman of its statistical committee, remaining so until 1948. The work of this committee, 
with members representative of the different kinds of statistical acumen, exercised con- 
siderable influence on medical research, and was a powerful stimulus to recognition of the 
need for and the more rapid diffusion of statistical knowledge throughout the profession. 
At the School of Hygiene and Tropical Medicine, in addition to the continuation of his 
long-term work with Topley, his interests widened and, both by his teaching and writings, 
his influence in and on the profession steadily increased. It is not surprising that many 
honours were bestowed on him. He was awarded the Buchanan Medal of the Royal Society 
in 1927 and the Guy Medal in gold of the Royal Statistical Society in 1945; he was Milroy 
lecturer (Royal College of Physicians) in 1922 and Herter lecturer in Baltimore in 1931. In 
1924 he was elected to the Fellowship of the Royal College of Physicians and of the Royal 
Society in 1928. From 1919 to 1934 he was honorary secretary of the Royal Statistical 





a i 


> ten 
78 on 
sub- 
tions 
ts of 
field 
oned 
erent 
ject, 
pital 
hera- 
time, 
eries 
nany 
ional 
f the 
iman 
nder 
, the 
1 the 
rs of 
pects 
ution 
nued 
; the 
nent 
rowd 
d in 
ciety 
actic 
hods 
‘this 
\der- 
ame 
ttee, 
con- 
f the 
sion. 
f his 
ings, 
any 
iety 
ilroy 
. in 
oyal 
tical 





Masor GREENWOOD 3 


Society and became its president in 1934-6. He was an honorary member of the American 
Statistical Association and of the Indian National Academy of Science, and a member of 
the International Statistical Institute. In later years Greenwood was associated with 
Biometrika, first in an advisory editorial capacity and from 1946 until his death as one of 
the Trustees. . 

It is difficult at present to assess his contributions to the advance of knowledge—as one 
of the earliest medical men adequately equipped statistically, his investigations, individual 
and collaborate, perhaps naturally covered a multitude of individual subjects. Even from 
his earliest years, however, his principal interests were in the problems of epidemic disease, 
and probably his contributions here, together with his studies on industrial sickness and 
accidents, will remain his most abiding claims. Yet more important than his writings, 
perhaps, is the success with which he laboured throughout his professional life to introduce 
statistical ideas and methods into medical investigation. The opportunities afforded for 
this by his associations with the research and teaching organizations in London were 
eagerly grasped by him, and the many who sought his advice must feel indebted for the 
inspiration and guidance so consistently obtained. In those fortunate enough to have been 
more intimately associated with him remain the deep affection and respect offered only to 
the great. 

P. L. MCKINLAY 











[4] 


TABLES OF THE 5% AND 0-5% POINTS OF PEARSON CURVES 
(WITH ARGUMENT £, AND f,) EXPRESSED IN 
STANDARD MEASURE 


By E. S. PEARSON anp MAXINE MERRINGTON 


1. THE PURPOSE AND FORM OF THE TABLES 


A considerable proportion of current statistical research is concerned with the determination 
of the sampling distributions of statistics required either as estimators or for use in tests of 
significance. It is often the case that while the distribution itself cannot be expressed in any 
simple form, the sampling moments can be derived and numerical values calculated either 
precisely or as approximations. In a great number of cases such investigation shows that the 
distribution tends to the normal form as the sample size is increased. However, this informa- 
tion regarding the limit is often not sufficient, and we require an answer to the question: what 
error is involved in assuming normality when the sample size has a specified value, n? In 
other words, to what extent in practice will the knowledge of the expectation and standard 
error of a particular statistic suffice? 
Accumulated experience has shown that the moment ratios 


WBi = ¥1 = Malta, Ba = Yat 3 = Mal (1) 
provide extremely useful indices of departure from normality, but it is clear that the 
knowledge that a certain statistic 7’, has say, for n = 20, a sampling distribution with 
VA, = 0-09, £, = 3-21, does not immediately help us in assessing the error involved in 
assuming that 7' is normally distributed. There can be no complete answer to this question 
since the first four moments do not in general specify a frequency function. However, we 
have thought it useful to make available tables which will relate certain percentage points, 
expressed in standard measure, to the /, and /, of the distribution, for curves of Karl 
Pearson’s system. These curves, derived from the differential equation 

ydx Cy+C,x+Cyx?’ 
while differing in algebraic form, are continuous in shape for changes in f, and £,. If the 


frequency curve is denoted by y=f(x), b,<x<by, 
and x has expectation ~ and standard deviation o, then we have tabulated the values of 
u = (x,—)/7, where ta 

f, f(x) dz = a, (3) 


for a = 0-005, 0-05, 0-95 and 0-995. This we have done for the combination of values of 
4, and £, shown in the tables below. We have used /, = y? rather than ,/@, as argument in 
order to make use of an earlier tabulation (Pearson, 1925, pp. 439-42, and Tables for 
Statisticians and Biometricians, vol. 2, Tables L); the skewness has been assumed positive, 
i.e. fg > 0. 

We should have liked to have included tables giving additional percentage points, e.g. for 
a and 1 —a equalling 0-01 and 0-025, and it is hoped to compute these at a later date. In the 
first instance, however, the chcice of the levels, 0-005 and 0-05, was fixed for us by the 
earlier work. With these two points known at each end of the distribution, the broad effect of 
departure from normality can be judged. 








E. S. PEARSON AND MAXINE MERRINGTON 5 


Our reasons for believing that the percentage points of Pearson curves, tabulated according 
to #, and f,, will provide useful approximations to the corresponding points of other dis- 
tributions for which only the sampling moments are known, may be summarized as follows: 

(a) These curves represent the exact distribution of a number of the more important 
statistics in common use, a property which is not shared, for instance, by curves of the 
Gram-Charlier system. This suggests a general appropriateness of form. 

(6) In several cases percentage points have first been calculated using a Pearson curve 
with the correct first four moments, and later computational work has substantially con- 
firmed the accuracy of the approximation. This occurred in the case of the distribution of 
range, where the table of percentage points based on Type I and VI curves in use from 1932 
was superseded for n<12 by the exact calculations of 1942 (Pearson, 1932; Hartley & 
Pearson, 1942). In other cases, for example with forms of the likelihood ratio developed by 
Wilks, the adequacy of approximations based on this system of curves has been checked by 
comparison in certain special cases where the true distribution is known. 

It is in the Type IV area of the £,, f.-plane (see Fig. 1) where there is at present least 
evidence regarding the appropriateness of this curve system in sampling theory. For /’s 
in the Type IV area there are a number of alternative curve systems, and work aimed at 
comparing their different properties is in progress. But whatever the outcome of such 
research, there can be little doubt that within this area of the £,,/,-plane, the Type IV 
approximation will give far more accurate values to the percentage points than would be 
obtained from the normal probability scale. 


2. ILLUSTRATIONS OF THE USE OF THE TABLES 
Example 1 
Bailey (1950), in considering the mathematical theory of what he terms a simple stochastic 
epidemic, gives a table of moments for the time of completion of an epidemic, when all 
available susceptibles have been exhausted. Thus if 7 is the completion time in a community 
containing n susceptibles, he shows that appreciable skewness and kurtosis remain in the 
distribution of 7, even when 7 is very large. With n = 80, he gives: 





v1 Ya 4,=y7 Bg =Yat3 





0-771 1-114 0-594 4-114 




















The following figures for percentage levels of (r—7)/o, may be derived by interpolation in 
Tables 1-4 below: 











Lower Upper 
05% 5% 5% 05% 
Using Type IV approx. — 1-95 — 1-41 1-82 3-33 
From ‘normal’ probability scale —258 | — 1-64 1-64 2-58 























The figures give added meaning to Bailey’s measures of skewness and kurtosis and a com- 
parison with the normal-scale percentage levels is instructive. 





6 Tables of the 5% and 0-5% points of Pearson curves 


Table 1. Lower 5 % points of the standardized deviate (x, —)/o, (a = 0-05) 
(Note that for positive skewness, i.e. 43>0, the deviates in this table are negative.) 


























































































































B Ay 0-00 | 0-01 | 0-03 | 0-05 | 0-10 | 0-15 | 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 | 1-00 

2 
18 | 156); — -- _ — — — —_ — —_ —_ —_— _ —_ —_ 
2-0 | 1-61 | 1-56 | 1-51 | 1-47 | — _ —_ — — —_ —_— _— — — — 
2-2 | 1-64 | 1-59 | 1-55 | 1-52 | 1-46 | 1-40 | 1:35 | — — — — _— — _ — 
2-4 | 1-65 | 1-61 | 1-58 | 1-55 | 1-50 | 1-45 | 1-41 | 1:33 | — —_ _— — — — — 
2-6 | 1-65 | 1-61 | 1-59 | 1-57 | 1-53 | 1-49 | 1-45 | 1-38 | 1:30) — _— —_ — — — 
2-8 | 1-65 | 1-62 | 1-59 | 1-57 | 1-54 | 1-51 | 1-48 | 1-42 | 1-35 | 1:29) — —_ — —_ _ 
3-0 | 1-64 | 1-62 | 1-59 | 1-58 | 1-55 | 1-52 | 1-49 | 1-44 | 1-39 | 1-33 | 1-27) — — — — 
3-2 | 1-64 | 1-61 | 1-59 | 1-58 | 1-55 | 1-53 | 1-50 | 1-46 | 1-42 | 1-37 | 1-31 | 1-25 | 1:19) — — 
3-4 | 1-64 | 1-61 | 1-59 | 1-58 | 1-55 | 1-53 | 1-51 | 1-47 | 1-43 | 1-39 | 1-35 | 1-30 | 1-24] 119) — 
3-6 | 1-63 | 1-61 | 1-59 | 1-58 | 1-55 | 1-53 | 1-52 | 1-48 | 1-45 | 1-41 | 1-37 | 1-33 | 1-28 | 1-23 | 1-18 
3-8 | 1-63 | 1-60 | 1-59 | 1-58 | 1-55 | 1-54 | 1-52 | 1-49 | 1-46 | 1-42 | 1-39 | 1-35 | 1-31 | 1:27 | 1-23 
4:0 | 1-62 | 1-60 | 1-59 | 1-57 | 1-55 | 1-54 | 1-52 | 1-49 | 1-46 | 1-43 | 1-40 | 1-37 | 1-34 | 1-30 | 1-26 
4-2 | 1-62 | 1-60 | 1-58 | 1-57 | 1-55 | 1-54 | 1-52 | 1-49 | 1-47 | 1-44 | 1-42 | 1-39 | 1-36 | 1-32 | 1-28 
4-4 | 1-62 | 1-60 | 1-58 | 1-57 | 1:55 | 1-54 | 1-52 | 1-50 | 1-47 | 1-45 | 1-42 | 1-40 | 1-37 | 1-34 | 1-31 
4-6 | 1-61 | 1-59 | 1-58 | 1-57 | 1-55 | 1-54 | 1-52 | 1-50 | 1-48 | 1-45 | 1-43 | 1-41 | 1:38 | 1-35 | 1-33 
4-8 | 1:61 | 1-59 | 1-58 | 1-57 | 1-55 | 1-53 | 1-52 | 1-50 | 1-48 | 1-46 | 1-44 | 1-41 | 1-39 | 1-37 | 1-34 
5-0 | 1-60 | 1-58 | 1-57 | 1-56 | 1-55 | 1-53 | 1-52 1-50 1-48 | 1-46 | 1-44 | 1-42 | 1-40 | 1-38 | 1-35 

Table 2. Upper 5 % points of the standardized deviate (x, —)/o, (a = 0-95) 

WN 0-00 | 0-01 | 0-03 | 0-05 | 0-10 | 0-15 | 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 | 1-00 
aN 
61300) — |.) |e | — |e | eK fet mK Le | aK he he he 
ok. ge. bk. be. Oe en ee eee ee ee ee ee 
2-2 | 1-64 | 1-68 | 1-71 | 1-74 | 1-77] 1-60 | 183) — | —| —| —}| — | —] — |] — 
2-4 | 1-65 | 1-69 | 1-71 | 1-74 | 1-77 | 1-80 | 1-83] 187) — | — | —}] —|—] —] — 
2-6 | 1-65 | 1-68 | 1-71 | 1-73 | 1-76 | 1-79 | 1-81 | 1-86 | 1-90; — | —}| — | —|] —] — 
2-8 | 1-65 | 1-68 | 1-70 | 1-72 | 1-75 | 1-77 | 1-80 | 1-84] 1-88 | 192} — | — | — | — | — 
3-0 | 1-64 | 1-67 | 1-69 | 1-71 | 1-74 | 1-76 | 1-78 | 1-82 | 1-86 | 1-90} 1-93} — | — | — | — 
3-2 | 1-64 | 1-67 | 1-69 | 1-70 | 1-73 | 1-75 | 1-77 | 1-80 | 1-84 | 1-87 | 1-91 | 1-94 | 1-98 | — | — 
3-4 | 1-64 | 1-66 | 1-68 | 1-69 | 1-72 | 1-74 | 1-76 | 1-79 | 1-82 | 1-85 | 1-88 | 1-92 | 1-95 | 1-98 | — 
3-6 | 1-63 | 1-65 | 1-67 | 1-68 | 1-71 | 1-73 | 1-74 | 1-77 | 1-80 | 1-83 | 1-86 | 1-89 | 1-92 | 1-95 | 1-98 
3-8 | 1-63 | 1-65 | 1-66 | 1-68 | 1-70 | 1-72 | 1-73 | 1-76 | 1-79 | 1-82 | 1-84 | 1-87 | 1-90 | 1-93 | 1-96 
4-0 | 1-62 | 1-64 | 1-66 | 1-67 | 1-69 | 1-71 | 1-72 | 1-75 | 1-78 | 1-80 | 1-83 | 1-85 | 1-88 | 1-90 | 1-93 
4-2 | 1-62 | 1-64 | 1-65 | 1-66 | 1-68 | 1-70 | 1-71 | 1-74 | 1-76 | 1-79 | 1-81 | 1-83 | 1-86 | 1-88 | 1-91 
4-4 | 1-62 | 1-63 | 1-65 | 1-66 | 1-68 | 1-69 | 1-70 | 1-78 | 1-75 | 1-78 | 1-80 | 1-82 | 1-84 | 1-86 | 1-89 
4-6 | 1-61 | 1-63 | 1-64 | 1-65 | 1-67 | 1-68 | 1-70 | 1-72 | 1-74 | 1-76 | 1-78 | 1-80 | 1-82 | 1-84 | 1-87 
4-8 | 1-61 | 1-62 | 1-64 | 1-65 | 1-66 | 1-68 | 1-69 | 1-71 | 1-73 | 1-75 | 1-77 | 1-79 | 1-81 | 1-83 | 1-85 

| 1-60 | 1-62 | 1-63 | 1-64 | 1-66 | 1-67 | 1-68 | 1-71 | 1-73 | 1-74 | 1-76 | 1-78 | 1-80 | 1-82 | 1-84 








—-- Se eee eeeeeteeneneneend 





le “| 


fon 


an be oe ok Oe nA RA ARK AS ARS 





i aaa | 





E. S. PEARSON AND MAXINE MERRINGTON 7 


































































































: Table 3. Lower 0-5 % points of the standardized deviate (x, —1)/o, (a = 0-005) 
(Note that for positive skewness, i.e. 4,>0, the deviates in this table are negative.) 
1-00 R 0-00 | 0-01 | 0-03 | 0-05 | 0-10 | 0-15 | 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 | 1-00 
2 
a mel aay] — Pee | ae ee ee Fe ee |e ae ae ee ce 
a! 90:1:208 | 3:00 10-08. BEL ee A ee deed eh  ) oe BP oe ee 
i 2-2 | 2-10 | 1-99 | 1-89 | 1-82 | 1-68 | 1-56 | 1-46 ae fie: bien Pome i ane ee eee 
" 2-4 | 2-26 | 2-14 | 2-04 | 1-97 | 1-83 | 1-71 | 1-62) 144) — | — | —| —]| —]| —] — 
ra : 2-6 | 2:38 | 2-27 | 2-18 | 2-12 | 1-98 | 1-87 | 1-77 | 1-58 | 1-42 | — _— —_ —_— — — 
wi 2-8 | 2-49 | 2-38 | 2-30 | 2-23 | 2-10 | 1-99 | 1-89 | 1-71 | 1-65] 1-41 | — | — | — | — | — 
a : 3-0 | 2-58 | 2-48 | 2-39 | 2-33 | 2-21 | 2-11 | 2-01 | 1-84 | 1-68 | 1-53] 140; — | — | — | — 
or 3-2 | 2-65 | 2-55 | 2-48 | 2-42 | 2-30 | 2-20 | 2-11 | 1-95 | 1-79 | 1-65 | 1-51 | 1:39 | 127) — | — 
am 3-4 | 2-71 | 2-61 | 2-54 | 2-48 | 2-38 | 2-28 | 2-20 | 2-04 | 1-90 | 1-76 | 1-62 | 1-50 | 1-38 | 1-27) — 
18 3-6 | 2-76 | 2-67 | 2-60 | 2-54 | 2-44 | 2-35 | 2-27 | 2-13 | 1-99 | 1-85 | 1-72 | 1-60 | 1-48 | 1-37 | 1-27 
1-23 38 | 2-80 | 2-71 | 2-85 | 2-60 | 2-50 | 2-41 | 2-34 | 2-20 | 2-07 | 1-94 | 1-82 | 1-70 | 1-58 | 1-47 | 1-37 
1-28 : 4-0 | 2-83 | 2-75 | 2-69 | 2-64 | 2-54 | 2-47 | 2-39 | 2-26 | 2-14 | 2-02 | 1-90 | 1-78 | 1-67 | 1-56 | 1-45 
1.98 4-2 | 2-87 | 2-79 | 2-72 | 2-68 | 2-59 | 2-51 | 2-44 | 2-32 | 2-20 | 2-09 | 1-97 | 1-86 | 1-75 | 1-65 | 1-54 
1-31 4-4 | 2-90 | 2-82 | 2-76 | 2-71 | 2-62 | 2-55 | 2-49 | 2-37 | 2-25 | 2-15 | 2-04 | 1-93 | 1-83 | 1-73 | 1-62 
1.33 4-6 | 2-92 | 2-85 | 2-79 | 2-74 | 2-66 | 2-59 | 2-52 | 2-41 | 2-30 | 2-20 | 2-10 | 2-00 | 1-90 | 1-80 | 1-70 
1-34 4-8 | 2-94 | 2-87 | 2-81 | 2-77 | 2-69 | 2-62 | 2-56 | 2-45 | 2-35 | 2-25 | 2-15 | 2-05 | 1-96 | 1-87 | 1-77 
1-35 5-0 | 2-96 | 2-89 | 2-83 | 2-79 | 2-71 | 2-65 | 2-59 | 2-48 | 2-39 | 2-29 | 2-20 | 2-11 | 2-01 | 1-92 | 1-84 
Table 4. Upper 0-5 % points of the standardized deviate (x, —)/o, (a = 0-995) 
XN 
1-00 Ne 0-00 | 0-01 | 0-03 | 0-05 | 0-10 | 0-15 | 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 | 1-00 
| 
ne ee 6 ee eee es en ee es es ee ree es Oe 
hat 20 | 1-03 | 901 | 908) 900); — | —|/—|/—|/—|—;/—/|—/|/—|—-]— 
as 2-2 | 2-10 | 219 | 224 | 2-27 | 231 | 233/235; — | —|—|—]—|—]—]| — 
= 2-4 | 2-26 | 2-35 | 2-41 | 2-44 | 2-49 | 252] 253/253; — | —| —!|— |—]— | — 
“ris 2-6 | 2-38 | 2-48 | 2-54 | 2-57 | 2-63 | 2-66 | 2-68 | 2-70| 269; — | — | — | — |} — | — 
8 2-8 | 2-49 | 2-58 | 2-64 | 2-68 | 2-73 | 2-77 | 2-80 | 283 | 284] 283; — | — | —}] — | — 
it 3-0 | 2-58 | 2-66 | 2-72 | 2-76 | 2-82 | 2-86 | 2-89 | 2-93 | 2-95 | 2-96 | 295; — | — | — | — 
x 3-2 | 2-65 | 2-73 | 2-79 | 2-83 | 2-89 | 2-93 | 2-96 | 3-01 | 3-04 | 3-06 | 3-07 | 3-06 | 304); — | — 
on 3-4 | 2-71 | 2-79 | 2-85 | 2-88 | 2-95 | 2-99 | 3-02 | 3-07 | 3-11 | 3:13 | 3:15 3-16 | 315 | 314 | — 
1-98 3-6 | 2-76 | 2-84 | 2-89 | 2-93 | 2-99 | 3-03 | 3-07 | 3-12 | 3-16 | 3-19 | 3-22 | 3-23 | 3-24 | 3-24 | 3-23 
1.96 3-8 | 2-80 | 2-88 | 2-93 | 2-97 | 3-03 | 3-07 | 3-11 | 3-16 | 3-20 | 3-24 | 3-27 | 3-29 | 3-30 | 3-31 | 3-32 
1-93 4-0 | 2-83 | 2-91 | 2-96 | 3-00 | 3-06 | 3-10 | 3-14 | 3-20 | 3-24 | 3-28 | 3-31 | 3-34 | 3-36 | 3-37 | 3-38 
1-91 4-2 | 2-87 | 2-94 | 2-99 | 3-03 | 3-09 | 3-13 | 3-17 | 3-22 | 3-27 | 3-31 | 3-34 | 3-37 | 3-40 | 3-42 | 3-43 
1-89 4-4 | 2-90 | 2-97 | 3-02 | 3-05 | 3-11 | 3-15 | 3-19 | 3-25 | 3-29 | 3-33 | 3-37 | 3-40 | 3-42 | 3-45 | 3-47 
1-87 4-6 | 2-92 | 2-99 | 3-04 | 3-07 | 3-13 | 3-17 | 3-21 | 3-27 | 3-31 | 3:36 | 3-39 | 3-42 | 3-44 | 3-47 | 3-50 
1-88 4-8 | 2-94 | 3-01 | 3-06 | 3-09 | 3-15 | 3-19 | 3-23 | 3-28 | 3-33 | 3-37 | 3-41 | 3-44 | 3-47 | 3-49 | 3-52 
1-84 5-0 | 2-96 | 3-03 | 3-07 | 3-11 | 3-16 | 3-21 | 3-24 | 3-30 | 3-35 | 3-39 | 3-43 | 3-46 | 3-49 | 3-52 | 3-54 




































































8 Tables of the 5% and 0:5% points of Pearson curves 
Example 2 


If we write m=*3 |a,-3| 

Ni=1 
for the mean deviation in a random sample of observations 2,, 72,...,2,, drawn from a normal 
population, exact expressions for the first two sampling moments of m are known. Using these 
formulae for m and ¢g,, and some tabulated values for f, and #, derived from expansions 
given by Geary (see Editorial Notes, Biometrika, 1946, 1948), we have the following values 
for the cases n = 10 and 20: 








n m om B,(m) A,(m) 
10 0-75694 0-18945 0-1064 3-093 
20 077768 0-13438 0-0513 3-045 























We may now determine approximations to the lower and upper 0-5 % points in these two 
sampling distributions, expressed in terms of the population standard deviation as unit. 
These will be given by ™+/f,¢,,, where the factors f, for a = 0-005 and 0-995 are obtained 
by interpolation from Tables 3 and 4 below. The results are as follows: 

















* f f Lower Upper 
0-005 0-995 0-5 % point 0-5 % point 

10 — 2-24 2-86 0-333 1-299 

20 — 2-35 2-78 0-462 1-151 

















The values for n = 10 correspond, to three-decimal accuracy, with the exact percentage 
points derived from the probability integral of the mean deviation, as developed by Godwin 
(1946) and computed by Hartley (1946). The probability integral was not calculated beyond 
n = 15, but the accuracy of the approximation at n = 10 suggests that the values given at 
n = 20 should also be satisfactory. 


3. METHOD OF COMPUTATION 


The construction of the tables is most easily explained by reference to the accompanying 
Fig. 1. It was decided to limit the range of calculation to distributions having /, < 1-0 and 
£,< 5-0 (y,< 2-0).* Further, J and U curves were excluded so that the upper boundary of 
the region (in the sense of the chart) is the Type IX line corresponding to curves with equation 


y = fle) = yo(1+2)”. (4) 


* The original calculations were made in connexion with a Type I approximation to the sum of the 
terms of a hypergeometric distribution. The percentage limits were given for Type I curves having 
A; as large as 3-0. 





mal 
1ese 
ions 
lues 


two 
nit. 
ned 


age 
win 
ond 
1 at 


ing 
und 
y of 
‘ion 


(4) 


the 
ring 





E. 8S. PEARSON AND MAXINE MERRINGTON 9 


The tables previously referred to (Pearson, 1925, pp. 439-42 and Tables for Statisticians and 
Biometricians, vol. 2, Tables La-d) gave the standardized percentage limits for Type I 
curves lying in the area between the broken line shown in the chart and the Type III line, 
26, —3,—6 = 0. Considerable additional calculations were therefore needed. 

















Scale of B 
00 o1 02 03 04 os 06 07 08 09 10 
q i] 1 oy 
20 @ -4 20 
229 »o 4 22 
£ 
24@ os. + 24 
26@ & si 2% 
i 
28 © 28 
3-0 @ Normal point 30 
— 32 @ “Sas 32 < 
. 34 @ 34 % 
se = 
4 36¢ © ~. © 436 8 
2 ey, ~ 
38 O — fo) “lon ~ o- 38 
> ~ 
40 @ 2 0] 40 
e 
42 ® © +4 42 
44@ fo} etal 44 
46 @ [0] + 46 
43 ® {0} + 48 
50 6 , © © I 1 50 
00 1 02 03 0-4 OS 06 07 08 09 10 
Scale of B, 


Fig. 1. Showing the area of the £,, 8, plane covered by the tables and the framework 
of curves used in their calculation. 


Details of the earlier calculations have been described by one of us (Pearson, 1925, p. 438). 
To extend this work, a framework of /,, 4, points was used, as shown by small circles in 
Fig. 1. At each point the four percentage levels for the appropriate Pearson curve were 
computed to four-decimal accuracy. It will be seen that the points selected fall along certain 
lines; these lines and the method of procedure are indicated below: 


(a) Symmetrical distributions, Types II and VII; by interpolation in tables of the per- 
centage points of Student’s t-distribution, using the appropriate fractional degrees of 
freedom. 

(6) Type IX line; by integration of equation (4) above. 

(c) Aline of points shown below and nearly parallel to the Type IX line. The corresponding 
distributions are those with equations 


f(a) = 29(1 —x)-1/B(1-5, 9), (5) 


values of g being chosen to make £, = 2-4(0-2)3-8. The lower percentage levels were 
then found from the tables of the percentage points of the Incomplete Beta Function 
(Catherine M. Thompson, 1941), by interpolation for vy, = 2q in the row with v, = 2p = 3. 
For the upper levels the roles of v, and v, were reversed, and 1 —~ read for z. 











10 Tables of the 5% and 0-5% points of Pearson curves 


(d) The levels corresponding to £, = 0, £, = 1-8 were obtained from a rectangle and those 
for £, = 0-05, 8, = 2-0 by special quadrature. 

(e) Type III line; by interpolation in tables of percentage points of the x? distribution. 

(f) Type V line; by use of the reciprocal transformation, which converts a Type V into 
a Type III variable, and so leads again to the use of the x? tables. 

(g) Type IV area; the ordinates of the two series of curves for which circles are shown on 
the lines £, = 0-2 and f£, = 0-5 were computed and the percentage levels were found by 
quadrature. 


All intermediate values required to complete the tables were then filled in, using this frame- 
work. Where necessary, Lagrangian interpolation formulae were used. It should be noted 
that if we take as argument ./f,, then the upper percentage levels for the curve (,/f,, £2) 
become the lower levels for the curve (— /f;, £,). By use of the additional data made available 
in this way interpolation for the lower values of #, was made much easier. The figures in the 
tables have been recorded to two decimal places; it is hoped that the final figure is never in 
error by more than one unit. 


We are grateful to Miss Jean H. Thompson for assistance in some of the calculations. 


REFERENCES 


Battey, N. T. J. (1950). Biometrika, 327, 193. 

EpITorRiAL Note (1946). Biometrika, 33, 252; and correction in Biometrika (1948), 35, 424. 
Gopwin, H. J. (1946). Biometrika, 33, 254. 

Hartiey, H. O. (1946). Biometrika, 33, 257. 

Harrtey, H. O. & Pearson, E. S. (1942). Biometrika, 32, 301. 

Pearson, E. S. (1925). Biometrika, 17, 388. 

Pearson, E. 8S. (1932). Biometrika, 24, 404. 

THompson, CATHERINE M. (1941). Biometrika, 32, 151. 





those 





[ ll ] 


REGRESSION, STRUCTURE AND FUNCTIONAL RELATIONSHIP. 
PART I 


By M. G. KENDALL 


1. In this paper I shall try to remove a number of obscurities which have arisen in the 
past few years about the connexion between regression, functional relationship and 
structural equations in theoretical models containing a stochastic element. Some of these 
obscurities arise from notation; for example, we sometimes write y = ax to denote both 
a purely mathematical relationship and a regression relationship, or we write y = x +e¢ while 
thinking of y and z as mathematical variables and ¢€ as a random variate, and either procedure 
can give rise to nonsensical results. But there are also sources of confusion which are not 
purely notational and lead, for instance, to differences of approach in econometrics through 
‘errors in variables’ or ‘errors in equations’. Related expressions which give rise to mis- 
understanding are the ‘fitting of lines when both variables are subject to error’, the ‘fixed 
variable’ of regression analysis and the ‘predetermined variable’ of econometrics. 


2. Let us, in the first place, go back to consider the concept of regression as understood 
by Galton, Karl Pearson and Yule. To them a regression line expressed a property of a 
bivariate population. It summarized one of the features of joint relationship in very much 
the same way as the mean summarizes the properties of a univariate distribution. If x and 
y represent random variates with a frequency function f(x,y) the series of parameters 
corresponding to particular values of xz, say X, 


tse =[" vay] |” sendy (1) 
bears an obvious relation to the momenis in the univariate case and the curve 
%=wxe=[- vtXnray] |” £Xvay (2) 


expresses the way in which the rth moment of y varies according to assigned values of x, In 
the particular case when r = 1 we have the so-called regression line of y on x 


y={" wiX.nay/ |” saX.nev. (3) 


I shall throughout use capitals such as X, Y to denote only mathematical variables such as 
current co-ordinates and x, y to denote random variates; and I distinguish between the 
variable, which means a mathematical variable in the ordinary sense, and a variate, which is 
associated with a frequency or probability distribution of possible values. 


3. The regression line (Y, X), from this point of view, is in no sense a functional relation- 
ship between y and x. We may regard it as a functional relationship between Y and X; given 
any value X the integral on the right in equation (3) determines a value of Y. But taken in 
conjunction with the underlying frequency distribution, equation (3) is more than a functional 
relation; for we are to regard some values of x as occurring with greater probability than 
others. : 


4. The point becomes clear when we examine a statement which is often made in text- 
books of statistics (including, I regret to say, one of my own) that when the correlation 











12 Regression, structure and functional reiationship 


coefficient is unity there is a functional relation between the variables. This is true only in 
a rather special sense. There is merely a functional relation between the variates. If we have 
two (mathematical) variables Y and X defined in a certain range, it is statistical nonsense 
to speak of their distribution and still more so to speak of their variance and covariance or 
of their correlation. A variable has no frequency function associated with it. We can, of course, 
associate such a function and sometimes we do so; but in the act of doing so we turn the 
(mathematical) variables Y and X into (statistical) variates 2 and y. If the product-moment 
correlation of these variates x and y is unity there is a linear relation between x and y. One 
statistical variate, measured about its mean, is then a constant times the other; but there is 
still an underlying frequency function common to both. To put it geometrically, if a bivariate 
frequency surface shrinks or collapses into one dimension there remains a univariate frequency 
curve which expresses the relative frequency or probability with which values of the variate 
arise. All that has happened is that two-way variation has contracted into one-way variation 
on a scale which is functionally connected with the two original scale variables X and Y. It 
will be seen later, I hope, that I am not splitting hairs. 


5. I repeat then, that to the founders of regression theory the regression curves were 
summarizing aspects of an underlying multivariate distribution. This outlook dominated 
orthodox statistical thought until the early 1920’s. At that stage we may detect several lines 
of departure; or perhaps it would be better to say, several extraneous lines of development 
converging on to the topic of regression. The fitting of curves to series of points when both 
variables are subject to error; the development of component- and factor-analysis; the type 
of problem which led Frisch to introduce confluence analysis; these and other subjects aim 
in common at the production of functional relationships, in some sense or other, between 
a complex of variables or variates, and they have all become entangled more or less with the 
regression concept. The change in outlook on regression analysis itself was subtle and is 
difficult to trace in the literature because it never (so far as I know) found explicit expression. 
There was, however, such a change. It may perhaps be brought out most clearly by an 
examination of what is meant by a ‘fixed variable’, or, as I should prefer to say, a ‘fixed’ 
variate. 


6. We suppose ourselves given a bivariate population specified according to two variates 
x and y. For non-statistical reasons we regard one as ‘dependent’, the other as an ‘inde- 
pendent’ variate. These terms are themselves. misleading and tendentious from the older 
point of view, a fact which has led to the introduction of other words with a similar (though 
not, apparently, an identical) connotation such as ‘explanatory’ or ‘predetermined’ 
variable.* I do not like adding to the terminology of an overburdened subject, but to be 
absolutely clear I shall, at least for the purposes of this article, use a new word. The variate 
x which enters into the right-hand side of equations of type (1) will be called the predicated 
variate. The variate y, which also enters into the right-hand side but is integrated out, will 
be called the unpredicated variate. For multiple variation we may have more than one 
predicated variate, but there will never be more than one unpredicated variate.t The 
corresponding variables Y and X may also be said to be unpredicated or predicated 
respectively. 

* In the theory of stochastic processes, when a sequence of random functions depends on a variable ¢, 
this is also referred to sometimes as the ‘independent variable’. 


t Regression theory may be extended to the regression of vectors on vectors, in which case it might 
be said that we have several unpredicated scalars making up the unpredicated vector. 





a p< i ee toot ae _ as 


nly in 
» have 
sense 
nce or 
ourse, 
mn the 
ment 
. One 
ere is 
ariate 
uency 
ariate 
lation 
Y. It 





M. G. KENDALL 13 


With this convention, we may say that the object of regression analysis for two variates 
y and z is to express the mean of the unpredicated variate y as a function of the predicated 
variable X; or, reciprocally, the mean of x as a function of Y. 

7. In the classical approach questions of sampling usually arose in circumstances where 
the sampled members were chosen by a method which did not depend on the values of the 
variates they bore. There was no question cf errors of measurement. The sample was drawn 
or given, the values of the variates determined for each member, one variate selected as the 
predicated variate for non-statistical reasons, and the regression line then estimated. In more 
recent developments, however, it has been usual to regard the values of the predicated variate 
as determinate before the sample is drawn; that is to say, we fix certain values of X and 
observe the corresponding values of y. Instead of taking members at random with respect 
to both variates we assign values of one and sample at random with respect to the other. 

8. It is easy to imagine circumstances where this procedure has decisive advantages and 
even where it is practically necessary and, conversely, where it has decisive disadvantages or 
is impossible. 

(a) If we know that the regression is linear and have only a few observations to make, it 
may be better to arrange for them to be made near the two tails of the x-distribution instead 
of near the mean, which would give less accuracy to the sample regression line. 

(b) We may be compelled to choose members according to the values of one of the variates 
under consideration. In an inquiry into the relation between personal income and personal 
savings, for example, we may require to stratify by the income variate. In considering the 
relation between performance in final examinations and performance at entrance examina- 
tions in a group of students we often have only a truncated part of the predicated variate 
(entrance performance) because students who fail the entrance do not proceed to the final 
examination. 

(c) On the other hand, if pre-selection of values of the predicated variate destroys or 
seriously impairs the validity of significance tests we may purchase extra accuracy too 
dearly. We then arrive at the familiar dilemma in the systematization of design—whether 
we prefer to be closer to the truth without knowing how close, or to run the risk of being 
further from it while knowing how far away we are likely to be. 

(d) It should be observed that I do not adduce as an example the case occurring frequently 
in physics where we assign values of one variable and measure the values of the other; as, 
for example, in a laboratory experiment on Hooke’s law, where we hang pre-selected weights 
on a spring and measure the extension. This looks like a case where the weight variable is 
pre-selected completely. But, in fact, it does not fall within the ambit of regression analysis 
at all unless there are no errors in the determination of the weights. 

9. Let us suppose for simplicity that the regression of y on z is linear and that the variates 
are measured about their mean. The regression line is then 


Y = fy+h,X, (4) 
where f, and f, are unknown constants. We can also write 
Yx = Pot AX +x, (5) 


meaning that the random variate yy depends on a mathematical variable X, two unknown 
constants £, and /, and a random variate ¢ which also depends on X. What we should not 
write for the regression line is an expression such as 


y = fot frre. (6; 











14 Regression, structure and functional relationship 


This means something quite different, namely, that y is the weighted sum of two random 
variates x and ¢. It also seems to me confusing in this context, if not actually improper, to 
introduce the ‘conditional’ variate y| x and write 


y|x = Bot fyx+e|z, (7) 
which means that x is a random variate, that for any value of z, y|” and e|x are random 
variates, and that y|« is a weighted sum of z and e|z. 

The difference between (5) and (7) is that (5) asserts something about a random variate 
y and a variable X, whereas (7) asserts something about a pair of random variates y and =. 


10. Consider now the problem of estimating the constants /, and f, in (4) from a random 
sample of values of y and x. We have two cases to consider: (1) when the sample is drawn so 
that values of y and x both occur at random, (2) when it is drawn by specifying values of 
x and only y varies at random. 

To make any headway at all we require certain further assumptions. We may be content 
to rely on the principle of least squares, without looking for any deepez justification of that 
principle, and hence to minimize the sum of squares 


X (y; — 5, — by %,)*, 
41 


which leads to the familiar estimators 

by = 0, = E22 ’ (8) 
both variates being measured about their observed mean. There is something to be said for 
accepting the principle of least squares in its own right (cf. Lindley, 1947). If, however, we 
do not accept least squares as a basic principle of estimation we must make certain assump- 
tions about the random variate ¢,. What we do in practice is to suppose (1) that ey is 
independent of X; (2) that it is a normal variable with zero mean and variance, say 07; 
(3) that the observed values of €, are independent of each other. The joint distribution of 
a set of n «’s, whatever the values of the X’s, is then 


] 1. 3 
aayneXP| ~ 593, th de den. (9) 


We now use (5) to transform this to the random variables yy and obtain 


] 1 
aaenae*P | Ig? Ux -bo~ bX) dyx 1---dYx n; (10) 


We then use the principle of maximum likelihood to determine /, and /, so as to minimize 
the sum inside the braces, which leads us back to (8). It does not matter if we are ignorant 
of the value of o. 


11. This process is legitimate only if we use equation (5) to transform from the random 
variable element de, ...de,, to the element dy, ,...dyx,,- It is not legitimate if we use (6) 
or (7), for we are not entitled to express an n-fold element de, ... de, in terms of an element 
comprising 2n variates, n y’s and nx’s. The method therefore provides a solution only in the 


case when values of X are given. What can be said about the alternative case when y and 
x are both random variates? 





ndom 
ver, to 


(7) 


ndom 


ariate 
and x. 
ndom 
wn so 
ues of 


mtent 
f that 


(8) 


id for 
er, we 
jump- 
€x is 
i o*; 
ion of 


(9) 


(10) 


imize 
orant 


ndom 
se (6) 
ment 
n the 
y and 





M. G. KENDALL 15 


The usual answer to this question is that ‘we may regard the 2’s as fixed’ by an argument 
of the following type. Suppose the joint distribution of y and z is f(x, y). The probability of 
the observed sample is then 

P =f (x1, Y;) ---f(@ns Yn) Ut, ... dx, dy, ... dy,,. aR 
Transform to new random variates by the equations 





a= &, 
(12) 
y=fot Pere. 
The Jacobian of the transformation is 
O(a, y) _ 
o(g,€-) 
and hence P= f(&,89+8,6:+€1) ---f(Ex: Bo t+ Pién +€n) db; --. dE, de, ... dey. (13) 
If this splits into two factors such that the e-distribution is independent of the é-distribution 
a SE, Bo+ Pr +6) = gE) He), (14) 
where g and A are frequency functions. This is equivalent to 
f(x,y) = g(x) h(y—Bo— A, 2). (15) 


If and only if this condition is satisfied we may estimate f, and /, from A(e) alone. In 
particular, if ¢ is distributed normally the maximum likelihood solution leads back to (9) 
and thence to (8). Thus, as the random origin of the z’s does not affect the estimation we can 
regard them as the assigned values of a variable or as ‘fixed’. 


12. I cannot see any other way of arriving at the desired result unless we import yet 
another principle of inference, which is usually a confession of failure and in any case is not 
to be done where avoidable. The points to be noticed are: (1) that we have assumed thee’s to be 
independent of the z’s and (2) that we have used in equations (12) a transformation depending 
on the view that both x and y are random variates. It is usually stated that ‘by regarding 
the z’s as fixed’ we avoid making any assumption about the distribution of x; and, indeed, 
it is sometimes implied that the estimation remains valid whatever the distribution of x. This 
is true when the hypothesis as to linearity is satisfied, but not necessarily in alternative cases. 

In questions of estimation we are mainly concerned with the distribution of the e’s. Where 
tests of significance are required we are interested in the distribution of the residuals about 
a fitted regression line, which is quite another thing. 


13. Where the values of the predicated variate are decided before the sample is drawn 
I shall call the variate determined, in the contrary case undetermined. The latter is the 
classical case; the former is the type more frequently considered of recent years, particularly 
since Fisher’s (1922) derivation of the t-test for regression coefficients. 

It will be observed that in the determined case the problem of finding a line of type 
Y = £,+/,X is not a bivariate problem at all. To make the contrast as sharp as possible 
we might even say that it is not a regression problem as understood by Galton and Pearson. 
It considers, not a pair of random variates, but a single random variate yy depending on 
a mathematical* variable X. To arrive at the ordinary estimate (8) of the coefficients £, and 
f, it assumes that the residual €y is normal with constant variance. We also require the 

* I should have used the customary word ‘parameter’ here to denote X but for the fact that ‘ para- 


meter’ has a special meaning in statistics. One sometimes feels that a Spirit of Perversity has been at 
work on statistical nomenclature in order to create difficulty and confusion. 








16 Regression, structure and functional relationship 


assumption that under the system which generated the sample, values of yy for the different 
value of X were chosen independently, or equivalently that ¢, for any particular value of 
X is independent of ¢y for other values. 


14. In the undetermined case we require the condition expressed by equation (15). It 
follows that to arrive at the estimator (8) we cannot in every case regard the predicated 
variate as determined but that we may do so if (15) is satisfied. In that case it follows that 
the regression is linear. For any fixed X we have 


Ety) ={" vhty-Ao- AX )dy 


=|" y- BoA: X) My BoA, X)dy +(6,+B,X)[”_ My Bo-B,X)dy 


= fy+ PX, * 
provided that the mean i) th(t) dt 


is zero, which we may take to be so without loss of generality. 
It is of some interest to examine the consequences of (15) more closely. 
We have 


{ f ethetihu f(x,y) dady = | { etht+thy g(x) h(y — By — b,x) dady 
ais | | eth tttle(Ao+h:2) etteu-fo—A:®) g(x) h(y — By — 8, x) dady, 


or P(ty» te) = Py(ty + toy) Pn (te) e%%, (16) 
where ¢, ¢, and ¢,, are the respective characteristic functions of f, g and h. Taking logarithms 
and assuming the existence of cumulants of all orders we have 





Kyat)’ (it,)® Ki" (t, + teh, Ky (ity)” 
> = Fy 2 = itpBy +X th ry ry : (17) 
where x,x’ and x” refer respectively to f, g and h. From the coefficient of t] we have 
Ko =k, (r>0), (18) 
and from the coefficient of t{t, we have 
Ko = Bot Aikite (r=), (19) 
Ky = BK, (721), (20) 
Thus we have A, = <u (21) 
2 
Ky Ket 
— => (r2)), (22) 
Ky = Kg 


Equation (22) is Wicksell’s condition for linearity of regression (cf. Kendall, 1948, p. 335), 
which under certain conditions is necessary as well as sufficient. 

We notice that from equation (21) the constant /, is given as the ratio of the covariance of 
y and x to the variance of x in the parent population. The estimator (8) is therefore consistent. 


15. We may, then, regard the undetermined case as equivalent to the determined case 


only if the parent regression is linear. This is irrespective of any assumption about normality 
in the residuals. The conclusion is in accordance with common-sense requirements; for if 





ferent 
ue of 


5). It 
icated 
s that 


(16) 
ithms 


(17) 


(18) 
(19) 
(20) 


(21) 


(22) 


335), 


ice of 
stent. 
case 
ality 
for if 





M. G. KENDALL 17 


the regression is not linear, where we fix the values of X clearly makes a difference (apart 
from sampling effects) to the line we shall find. 


16. In order to make further progress in the matter of estimation in the undetermined case 
we still require the assumption of normality in the residuals. All the information available 
about f, is contained in the function h(é,)...A(é,) de, ...de,, and if we make the usual 
assumptions and use the method of maximum likelihood we return once again to equation 
(8). It may be noted that in this case x} = 0 for r> 2 and we shall have from (17) 


Krys = BiKp+s (r+ 0), (23) 
Kos = Bike (8¥2), (24) 
Koo = B3kg+ Ko. (25) 


17. So much for estimation. Somewhat similar considerations apply to the use of the 
t-test of significance of a regression coefficient with one predicated variable. In the determined 
case we derive the test on the basis of fixed values of X. In the undetermined case we may 
use the test in the same way if and only if the parent regression is linear and the residuals are 
normal without making any further assumption about the distribution of x. In such a case 
the test may be derived directly without passing the argument, so to speak, through the 
determined to the undetermined case; I have given the derivation in the particular case of 
joint bivariate normality (Kendail, 1948, p. 348). 


18. In applying the t-test to a regression coefficient or to the difference of two regression 
coefficients we are thus (if the test is to have any power) postulating linearity of regression 
in the alternative hypotheses under test. Furthermore, we require joint independence of the 
residuals ¢ and the x’s. Such conditions do not hold, for example, in autoregressive series, 
or, more generally, in any stochastic scheme containing lagged terms. Nor will they hold in 
time-variation where the successive values of x are auto-correlated owing to a time-effect, 
even though the correlation be a ‘nonsense’ one. For this reason the estimation and testing 
of significance of regression coefficients in time series differs from the ordinary case and 
special methods are necessary. 


19. Let us consider the question from the point of view of ‘conditional’ inference. The 
argument here, as I understand it, is as follows: We draw an undetermined sample by arandom 
process. This gives us certain values 2,,...,x,. We decide, before drawing the sample, that 
whatever these values prove to be, we will make an inference in the restricted population of 
values (y,x) for which only these values 2,,...,x,, appear. We do not fix the values of x in 
advance in the sense of the determined case; but we decide that we will regard them as fixed 
and make a conditional inference based on that assumption. 

In general, this type of procedure has certain dangers and disadvantages, but I do not 
propose to discuss it in generality. For present purposes it is enough to note that the con- 
ditional distribution of t is the same in form whatever the values of x (on the usual assumption 
about linearity of regression and normality of residuals) and hence that we can make a series 
of inferences in the ¢-distribution for different sets of x’s without having to confine ourselves 
to one particular set. In this context, I think, the ‘conditional’ approach is only another way 
of expressing the approach through the determined to the undetermined case. But it is not 
always so, and I repeat that ejvation (7) is not the same as equation (5). In equation (5) we 
select the values of X beforehand. In equation (7) we let chance select them and then confine 
our attention to the set so selected. The logic of the inference is different. 

Biometrika 38 2 











18 Regression, structure and functional relationship 


20. I interpolate here a comment to link up the ‘determined’ regression approach with 
factor analysis and the analysis of variance. In variance analysis it is customary to consider 
a variate y which may belong to one of i families as a systematic plus a residual component 
in some such form as YW, = Bit €- (26) 


This is similar in form to the regression equation 


yx = fx t€x, 
but differs in that the families i need no longer be representable on a continuous (and 
orderable) scale and that we do not impose any relation of linearity on the constants £; such 
as is involved in the regression approach. To make a significance test, however, we do require 
the same assumption about normality and independence in the residuals e. 


21. In factor analysis we postulate a scheme similar to (26), but the psychologist has, as 
a rule, not felt able to make the assumption about the normality, independence and equal 
variance of ¢; for alli. He regards é€; as a specific term peculiar to the ith fa ily. This involves 
him in the very serious trouble that his equations are then essentia’.y indeterminate, 
a difficulty which has not been satisfactorily resolved (cf. Kendall, 1950). The statistician 
working in other subjects can hardly afford to throw stones at the factor analyst on this 
account; for the statistician avoids the difficulty only by ‘assuming it away’, namely, by 
postulating qualities in his residual terms e. Sometimes he has prior evidence to support the 
reasonableness of such a postulation. But it is fair comment, I think, to say that a good many 
users of variance analysis and regression analysis make the assumption merely because they 
cannot otherwise obtain an answer and without examining whether it is the right or 
reasonable thing to do. 


22. Consider now the case of one unpredicated and p predicated variates x, ...x,. We may 


write the linear regression as = y — 8,4 8, X,+...+ ee (27) 
Once again we may regard the case as completely determined when the X’s are specified 
and write YX1,....Xp = Pot hy Xi +... + BypXpnt€x,,...,.Xp) (28) 


We may also have a completely undetermined case when no X is specified, and a partially 
determined case when some variates are determined and some are not. 

To estimate the /’s from a sample of n in the completely determined case we assume that 
€x, ... Xp 8 independent of the X’s, has zero mean and is normal with variance o*, and that 
values of ¢ are independent among themselves. This leads us, via the method of maximum 
likelihood, to the series of equations (where x, is a dummy variable equal to unity) 

Lyx; = Ufyx; +f, Ux, 2;+...+8,U2,x%,=0 (jf =0,...,p), (29) 
Let us consider the case when the predicated variates are completely undetermined. 
23. In the manner of $11 we transform by the equations 


£; = 2; (7 = 0,...,p), } 

o= Y—Po— hia ee —BpZp- 
The transformation has a unit Jacobian and if f(y, x, ... ,,) is the frequency function of y and 
the x’s we find, for the probability of the sample, a product of n terms like 


(30) 


He+ ¥ Bib bry» En) dedé, ... dy, (31) 





h with 
nsider 
onent 


(26) 


3 (and 
}, such 
equire 


1aS, as 
equal 
volves 
‘inate, 
tician 
n this 
ly, by 
rt the 
many 
e they 
tht or 


e may 

(27) 
cified 

(28) 
tially 
e that 
1 that 


imum 


(29) 


(30) 


yand 


(31) 





M. G. KENDALL 19 


To treat the x’s as determined variates we shall require a factorization 


fle ¥ xp; gp £1, om En) sad 9(E1, 2 En) h(e), (32) 

which is equivalent to f(y, %j,...,%,) = g(%,, ..-,%p) h(y— ZP;2;). (33) 
We find a similar factorization of the characteristic functions 

P(U, ty, ... ty) = CMD, (t, + By U, ... tp + By Ut) plu), (34) 


where u refers to y and t; to 2;. 
In terms of cumulants we have 


> Ke,91,..p0U)” (it,)*... (tt,,)* 

















r!s,!...8,! = iufy 
4p Reatp tt =H (t+ Byu ys (lp + By)? 
8!...8,! 
kK, (tw 
+5 a (35) 


Considering those terms where u appears only as the first power and taking xj = 0 we find 


Ky, 81, +++ 8p es By K,+41, s2,..ntp + Poker, 5941, soe 8p os = + ByXa,, eons Spt1 (36) 
as a generdlization of (19). In particular, taking s; = 1 and all the other s’s as zero, we have 
KE, 0,.00B5 c0en0 = P1K4,0, 0:1, 010 + B2K, 20052, nF +? + Bp Ko, 2.041, ond? (37) 


which are the standard equations (29) for the parent population and show that in any case 
equations of type (29) are consistent. 


24. To give the estimation equations (29) more than consistency we shall again require 
assumptions of normality of the usual type. From (23) we see that A continues to embody 
all the available information about the /’s, and a maximum likelihood solution of type (29) 
can be produced from it if we require that ¢ shall be a normal variate with variance o* 
independently of the x’s and that the various values of € are independent. As in the case of 
one predicated variate, we may take the predicated variates as determined if and only if the 
regression is linear; and to produce the estimating equations we require the residual to be 
normal. The application of the t-test follows under the usual assumptions. 


25. We next remove the restriction as to linearity by observing that 2,,...,2, may in 
particular cases be functionally dependent and could, for example, be the powers of a single 
variate 2x,,22...2f. Clearly, however, in such a case, by determining one variate we 
determine them all. This extension does not contradict the statement in §15 that we may 
regard the undetermined case as equivalent to the determined case only if the regression is 
linear. What §15 implies is that if we are estimating the constant f, in the equation 
Y = £,+,X we may regard x as a determined variate only if the parent regression really is 
linear. If it really is curvilinear, of the form, say, 

Y = fy+f,X+fqX*, 
then we may regard x as a determined variate only if we estimate f, and /, together. 

26. Up to this point nothing particularly new has emerged from the treatment except 
perhaps some clarification and the extensions of equation (36). I shall resume the line of 
development later in Part II of the paper in a discussion of structure in stochastic systems. 
At this stage I turn to a consideration of the problem of functional relationship and the method 
of attack developed by Frisch and Reiersol; but before doing so I raise one point of theoretical 
importance in connexion with significance tests of regression coefficients. 











20 Regression, structure and functional relationship 


27. When we are fitting regressions (say, for simplicity, with one predicated variate) we 
usually begin with simple linearity and proceed to take in quadratic, cubic and more 
complicated terms as necessity arises. The criterion of ‘necessity’ in this connexion is based 
on the size of the residual variance. One popular method of applying it is to test the 
significance of successive coefficients b,,b,, etc., as they are added and to stop at the point 
when one of them fails to attain significance. Now in view of § 25 above I doubt whether this 
is legitimate in the undetermined case. Suppose a regression line is really quadratic. We 
calculate a simple linear regression giving an observed coefficient b,. Now we can only regard 
the undetermined case as equivalent to the determined case if the parent regression really is 
linear, which it is not. The least-squares method of estimation and the t-test of significance 
must then fail or, at least, lose their previous theoretical justification. The estimator will be 
biased and the power of the test unknown. 

Or to put it another way: we set up the provisional hypothesis that the regression is linear. 
We find a value of b, which is not significant on an ordinary t-test. What should be concluded ? 
The usual conclusion would be that y and x can be regarded as independent, or rather that 
the mean of y is constant for all z. But it would be equally valid to conclude that relationship 
may exist in a non-linear form. We arrive at the basic contention of the Neyman-Pearson 
theory, that a test should specify the alternative hypothesis. 


28. So far I have followed the classical line of development to the extent of assuming that, 
both in the determined and in the undetermined case, the variates are measurable without 
error. This is rather exceptionally so in practice, although cases often arise where errors of 
ebservation are negligible compared with the actual variation among individuals. If we 
wish to take account of random errors of observation an essentially new situation arises. In 


fact, several new situations arise, and it is necessary to distinguish between them with some 
care. 


29. Consider in the first place the undetermined case of variates x, y distributed with 
frequency function f(x,y). We suppose that x is subject to an experimental error v and 
y to an experimental error w. Thus our observed variables 2’ and y’ are 


x’ = 2+, 
, (38) 
yY =Yytw. 
We meet here with a new kind of logical difficulty. Unless our technique of measurement can 
be pushed to the point of perfection we can never observe either z or v. Only the values 
x’ appear. Any hypotheses we make about x and v are thus capable of only very indirect 
verification, because they are so difficult to disentangle. In such circumstances it is natural 
to inquire why we do not work directly with the observables x’ and y’ and regard them as our 
primary variables, treating v and w as part of the obscure complex of causation which gives 
rise to the observed variation—‘ washing them into the error term’. Sometimes we may in 
fact do so; but more frequently we desire to separate the error of observation from the other 


causes of variation, e.g. because we want to make separate hypotheses about its behaviour, 
or because it is a special cause of variation which we hope to be able to reduce. 


30. To make much progress we must make some hypothesis about the behaviour of 
v and w. It is usually reasonable to suppose that each of v and w is independent of each of 
x and y and, perhaps, that v and w are independent of each other. In regression analysis we 
may then be interested in the regression of y’ on x, of y’ on x’, of yon x and of yon x’. In fact, 








e) we 
more 
based 
st the 
point 
or this 
>. We 
egard 
ally is 
cance 


vill be 


inear. 
ided ? 
- that 
nship 
arson 


that, 
thout 
ors of 
If we 
s. In 
some 


with 
y and 


(38) 


it can 
alues 
lirect 
tural 
is OUT 
gives 
ay in 
other 
riour, 


ur of 
ch of 
is we 
fact, 





M. G. KenpAaLui 21 


there are four regression lines which coincide in pairs. This part of the subject has been 
treated fully in an admirable review by Lindley (1947), who shows, among other things, that 
the regression of y’ on 2’ is not linear even if that of y on z is linear unless certain conditions 
are obeyed by the cumulants of the distribution of x and y. In PND if the distribution 
of x is normal only normality in v will preserve the linearity of y’ on 2’. 

31. Let us now consider whether we can have a situation corresponding to the determined 
case when the predicated variate is subject to error. Obviously we cannot do so in any strict 
sense, for we cannot fix the value of vin any given case. We can, so tospeak, aim at a particular 
value of x but not with any assurance of reaching that precise value. Thus our solution of the 
values of the predicated variate is essentially undetermined. To what extent, if any, can we 
regard it as determined for the purposes of estimation or significance tests? 

Something depends here on what we are trying to estimate. We may be interested in the 
regression of y on x as involving some inherent relationship irrespective of the observational 
errors; or we may require the regression of y on x’ because we intend to use the equation for 
prediction and can observe only x’. I consider these two cases in turn. 


32. If the observed variates x’, y’ are distributed as f(x’, y’) and this may be written as 
S(x,y) fo(v) fg(w), then the foregoing argument applies unchanged to show that the maximum 
likelihood estimator of the constant £, in Y = £,+/,X is given by 


Lyx —b, Xz? = 0. (39) 
This obviously must be so, for we are merely ignoring the errors v and w by factorizing them 


from the frequency function. But the quantities occurring in (39) are not what we observe. 
If we substitute from (38) we find 


X(y' —w) (w—v) — 6, X(x’ —v)? = 0 
or Ly'x’ — Xwa’ — Xvy’ + Low — b, (Zax’? — 2Ux'v + Xv?) = 0, (40) 
Only two of the seven quantities required to solve this equation are observed. To go further 
we must argue from the particular to the general. We know from our hypothesis about 
v and w that the mean values of terms like Lwa’ and Xwv vanish. As an unbiased estimating 
equation we then have Ly’x' —b,{Ta'? + Xv} = 0. (41) 


This is much better, but there remains the term Xv, proportional on the average to (n — 1) 0?/n, 
where go? is the variance of v. We meet here, as elsewhere throughout the theory of component 
analysis, the simple but important fact that we cannot estimate the relationship Y = £,+ £,X 
with any known accuracy whatsoever unless we know the variance of the errors of observation.* 
This is obvious enough, but it is a point worth bringing into prominence, because in the more 
usual type of statistical inquiry we can estimate the error of observation from the data 
themselves, whereas here we cannot. We must assume something about it or try to determine 
it from collateral experience. 


33. It should also be noticed that although (41) is an unbiased estimating equation the 
derived estimator of /,, namely, __ Ey’2' 

1°) Sa’? + Xv?’ 

is not necessarily unbiased even if we know Xv?. To some extent this is due to our rather 


arbitrary definition of bias. We must draw a distinction between an unbiased estimator and 
an unbiased estimating equation. 


(42) 


* If Y is a lagged value of X or vice versa the above statement ceases to be true. 











22 Regression, structure and functional relationship 


34. The estimator 5, given by (42)is clearly not distributed as Student’s ¢, and in the general 
case I do not know how to test its significance. Even if we assume the distribution of v known 
(with known variance) the problem of significance is formidable. It is not, however, 
inaccessible to inquiry—for example, the approximative method used by Welch (1938) on 
the Behrens test might be applicable. One thing seems clear: we cannot expect the t-dis- 
tribution, or any other distribution, to hold unless we fix both x’ and v, which only brings us 
back to the case when effectively we can ignore the error of observation. It appears, therefore, 
that the determined case cannot arise. 


35. Now consider the case when we are interested in determining a regression equation 
of type Y = f+ fi X’, (43) 


i.e. the regression of y on the observed x’. The same type of argument which we have already 
employed leads to the estimating equation 


Lyx’ —b, Xx’? = 0, (44) 
or Lyx’ — Lex’ — bd, Xa’? = 0. (45) 
Assuming that v is independent of x we have the unbiased estimating equation 
Ly'a’ —b; Xa’? = 0, (46) 
which depends only on the observables x’ and y’. We do not require the variances of v and w. 
The t-test of the estimator b may be applied only if we assume that y’ and 2’ are distributed 
pinot eh, f(a’, y') = gle’) W(x’ — By Biy’) (47) 
and that A is normal. 


36. In §§32-35 I have passed over one point in order to arrive at the estimating equations, 
and I now return to it. In the case leading to (39) we are assuming that for the variates x and 
y there is a basic distribution such that we may write 


Yx = fyot+ fh, X +e, (48) 


and take ¢ to obey the usual assumptions of normality. Otherwise the estimating equation 
(39) is not warranted by maximum likelihood. To justify (44) in the same way we require to 


be able to write Yx: = Bot fi X' +e’, (49) 
where again e’ obeys the normality conditions. Now (48) and (49) are not in general com- 
patible. Suppose in fact that the distribution of x, y is given by g(x) h(y—f)—f,x) with 
characteristic function ¢,(t, + t,f1) $,(t) ee. The characteristic function of x’ and y’ is then 
eho Py(ty a t./;) Pr(te) Py(t;) Pw(te), 
and this must be equal to, say, eso h(t, +t, 84) Dxy(te). 
Polty + te) Py(ty) Pu (te) , 
Thus ul A a = - 2) 4 —ts(8,— By) 
Golti+tBi) ~ Prlts)oule) 
= function independent of t, 
a“ Polt,A1) 
Palt,A3) 
as follows from putting ¢, = 0 on the left-hand side. Hence 


Pelt + teh) — Palts + tei) 
blah) 9°) = gop) (50) 








eneral 
mown 
vever, 
38) on 

t-dis- 
1gs us 
efore, 


ation 
(43) 


eady 


(44) 


50) 





M. G. KENDALL 23 


In terms of cumulants, writing x? for the rth cumulant of g, etc. and, absorbing i into ¢ 
temporarily, we then have 


> KE(t, + tof) _ zr k(t, 2,)" 
r! r! 








h x 2 eee eee, (51) 


Tr 
v 
+iK 


The cumulants x, merely locate means, the terms of second power give us 
Ki +k3 = Kg ’ (52) 
By Kd = By Kg. (53) 
These are equivalent to Lindley’s equation (7). Consider now a term of order r>2. The 
identification in powers of t, and t, give us r equations, r — 1 of which concern only x? and x@. 
If 2, and fj are not zero and are not equal this entails the vanishing of those cumulants. 
Thus, for variates with existing variances: 
If (48) and (49) are to hold for two different non-vanishing values /, and /; the distributions 
g(x), G(x), and f(v) are all normal. 
And if the residual ¢ is normal, y and 2 are jointly normally distributed. 


37. Theorems in which linearity of regression implied normality have been proved by 
Allen (1938) and Lindley (1947). Allen proved that if 
xz’ =Ix+, 
Wage os 
y' = me+w, 
then the necessary and sufficient condition for the regression of y’ and 2’ to be linear for all 
land m is that x and v are normally distributed. She required in her proof that all moments 
of x and v should exist, but Lindley showed that it is enough to assume the variance of 
x finite.* 
Lindley proved a general theorem for any number of predicated variates which in the case 
of a single variate reduces to this: 
If x’ =2+, 
¥y ‘= Yt, 
and v, w are independent random variates which are independent of x and y, then the 
necessary and sufficient condition that the regression of y’ on x’, say Y = £, +, X’, is linear, 
if that of y on z, say Y = £,+/,4X, is linear, is 


Ave = fiz, (55) 
where y is the cumulative function (the logarithm of the characteristic function). This may 
be put in the equivalent form dv. d 

(Bi- A) RE = pHs, (56) 


The reason that the theorem of §36 goes further than Lindley’s result (though consistent 
with it) is that I am assuming more than linearity of regression. He requires only linearity; 
I require the residual e’ to be independent of X’. The effect of my theorem is that I can only 
get it if the distributions of x and v are normal. ‘ 


38. The effect of such results on the investigation of regression lines in practice is clearly 
important. The errors of observation impair our estimators, vitiate our tests of significance 
and even bend our regression lines unless we are prepared to postulate normality in the 


* Rao (1947, and correction, 1949) and Fix (1949) have relaxed the conditions still further. 











24 Regression, structure and functional relationship 


variates, which is asking rather a lot, particularly in economic work. But they are not quite 
as devastating as they look. A slight departure from linearity will sometimes allow the 
ordinary theory to be used as an approximation. None the less, it would appear that much 
more effort is needed to reduce errors of observation than has sometimes been supposed. 
We are, I hope, moving out of the phase in statistical development when we gloried in 
a profusion of errors because of a pious hope in the central limit effect and the supposition 
that everything unknown could be swept into a single error term. 


39. I turn to the subject of functional relationships between variables. Let us begin by 
considering the problem of estimating the ‘true’ linear relation between two variables X and 
Y, both subject to errors of observation. This is the situation common in physics where we 
have, say, a number of observations on the length and temperature of a metal bar and wish 
to estimate the constant a in the relation L = a7’, or in economics where we postulate 
a relation between demand D and price P 


log P = alog D. 


In such a case we are not concerned with a parental population of values of random variates 
x and y, / and ¢, or d and p. As a rule one of our variables at least can be arranged to have 
approximately any value we like (though this is rarely so in economics) with any frequency 
we like. The problem is no longer to estimate a regression coefficient. It is to estimate the 
constants in a functional relation Y =a,+0,X. (57) 

40. The fundamental difficulty here is that we cannot replicate our observations by fixing 
values of X or Y. We observe two quantities x’ and y’ given by 


ee Ohad 


, (58) 
Yy = Y+wy, 


but as v and w are random variables we can never estimate the corresponding value of X and 
Y even if we make the usual assumptions that vy does not depend on X, wy on Y and that 
they are independent of each other. We have from (58) and (57) 

(yy — Wy) = % +o, (2x —vx), (59) 
and our random variables y’ and x’ are dependent on parameters Y and X and hence may 
vary from one observation to another, so that we never know whether we are observing the 
same distribution; and in fact we can safely assume in most cases that we are not. Equation 

59) is equivalent to , , 
— Yy = %t+ Wy — Ax —% rx, (60) 
and if we want to estimate a) and a, from the observed y’ and 2’ we have to make some 
assumption about the ‘residual’ a)— wy —a,vx, which depends on a, and a,. Or to put it 
another way, we require to be able to estimate not only a, and a, but X and Y for each 
observation. This brings us back to the case of essential indeterminacy in component 


analysis which I mentioned in §21. In fact, our present problem is one of the simpler problems 
in component analysis. 


41. What, then, are we to do? It is clearly no use telling the physicist that his problem is 
insoluble. He solves it often enough on intuitive grounds. What we can fairly tell him is that 
it is insoluble without some further information about the two error distributions of v and w; 
and in fact all that we need to know is the ratio of variance of v and w. 








a ee | 





M. G. KENDALL 25 


For the observed sample of n values of x and y we have the likelihood function 











1 L(ay—X)? Lyy— Y}? 
(27)* (var v var w)i” *P ~  2vare nw ~ Qvarw 
LAP Beg X)* | Ely'y aya, X) 
poe Qmir(o,0," P23" oS a3 “|: 


Butif we maximize this likelihood for variationsin X,,...,X,,7,ando,,,we getan unacceptable 
result (cf. Lindley, 1947, p. 235 and Kendall, 1950, p. 80), namely, that a, = o,,. We must, 
therefore, introduce a further assumption. Lindley remarks that the most common assump- 
tion is that the ratio o,,/0, is known. This produces consistent estimates of a and a. 


42. I leave the question of functional relationship at this point. It cannot be said that the 
problems have been satisfactorily solved. The method of maximum likelihood seems to 
break down, and we require a new principle of inference or a new assumption concerning the 
error variance before we can make progress. It may well be that different problems will 
require different assumptions so that no general solution exists, or it may be that we can only 
estimate limits to the error variances and hence that a solution of the problem is possible only 
in approximate terms. 


43. I had intended to include in this paper an examination of some questions arising on 
structural equations affected by stochastic elements. There has recently appeared a very 
important monograph from the Cowles Commission (Koopmans, 1950) which brings together 
a great deal of the work done by econometric writers on this subject. It settles some 
difficulties, but raises many others, and my treatment will have to be more extended than was 
originally proposed. I therefore intend to return to the matter in a separate Part II of this 
paper which will appear at a later date. 


REFERENCES 


Auten, H. V. (1938). A theorem concerning the linearity cf regression. Stat. Res. Mem. 2, 60. 

Fisuer, R. A. (1922). The goodness of fit of regression formulae and the distribution of regression 
coefficients. J. R. Statist. Soc. 85, 97. 

Fix, Evetyn (1949). Distributions which lead to linear regressions. Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability. University of California Press. 

KENDALL, M.G. (1948). The Advanced Theory of Statistics,1, 4th ed. London: Charles Griffin and Co. 

KENDALL, M. G. (1950). Factor analysis as a statistical technique. J.R. Statist. Soc. B, 12, 60. 

Koopmans, T. C. (editor) (1950). Statistical Inference in Dynamic Economic Models. New York: 
John Wiley and Sons; London: Chapman and Hall. 

Linptey, D. V. (1947). Regression lines and the linear functional relationship. J.R. Statist. Soc. 
Suppl. 9, 218. 

Rao, C. R. (1947, 1949). Note on a problem of Ragnar Frisch. Econometrica, 15, 245 and Correction, 
Econometrica, 17, 212. 

WE cz#, B. Z. (1938). The significance of the differences of two means when the population variances 
are unequal. Biometrika, 29, 350. 








[ 26 ] 


PARTIAL AND MULTIPLE RANK CORRELATION 
By P. A. P. MORAN, Institute of Statistics, University of Oxford 


Apart from a paper by M. G. Kendall (1942), partial rank correlation has been very little 
considered and the analogous definitions of multiple rank correlation have not been con- 
sidered at all. The present paper considers some of the problems which arise in the study of 
such coefficients. 

By using the generalized method of defining coefficients of rank correlation, given by 
Daniels (1944), Spearman’s coefficient of rank correlation r,, and Kendall’s ¢ can both be 
shown to be product-moment correlation coefficients between sets of suitably defined scores 
a,; attached to each pair of elements in the ranking. Whatever the probability distribution 
assumed to account for the ranking, these scores are not independently distributed and this 
fact bedevils the resulting distribution theory of the coefficients. If all the n! permutations 
of the ranks are equiprobable, r, has a variance 1/(n — 1) and t a variance 2(2n + 5)/9n(n — 1). 
They are both distributed symmetrically between the limits +1. Kendall has shown that 
the distribution of r, is well approximated by a Beta-type distribution so that 


n—2\* 
Ta) 43 
has Student’s t-distribution, with n — 2 degrees of freedom, to a high degree of approximation. 


It is perhaps worth pointing out that the same method gives a good approximation to 
a test for ¢ and that, analogously to r,, the distribution of 


9n(n— 1) _ 1 t 
t | 2(2n + 5) (2) 
1-# 
9n(n — 1) 
2(2n+5) 
‘degrees of freedom’. The use of this in practice may involve interpolation for fractional 
‘degrees of freedom’. For n>10 Kendall has shown that the normal approximation is 
satisfactory. 

In order to test how good an approximation is given by the above procedure, the exact 
probabilities for the tails of the distribution of S = 4n(n—1)¢ given by Kendall (1948) for 
n = 8, 9, 10 were compared with values given by testing ¢ as if it were an ordinary product- 
moment correlation coefficient based on 1 +{9n(n—1)/2(2n+5)} pairs of observations. 
Subtracting unity from S to obtain a continuity correction, dividing by 4n(n— 1), and doing 
a double interpolation in Miss David’s tables (1938) of the Correlation Coefficient we obtain 
the probability levels given in Table | below under the headings ‘ Approx.’ There is a possible 
rounding error of unity in the last decimal place of these probabilities. It is clear that 
agreement is good and the test can be safely used for n > 8. A somewhat better fit would be 
obtained by dividing S by 4n(n—1)+2 instead of 4n(n —1). The formulae are then rather 
more complicated. 

The fact that ¢ has a considerably smaller variance than r, does not imply that it provides 
a more powerful test of independence than r, any more than the fact that 


where ¢ is Kendall’s t, is well approximated by Student’s ¢-distribution with 


var (r,) = 1/(n—1) = var (product-moment correlation coefficient, r) 





ee ee) et lie 


y little 
n con- 


udy of 


ren. by 
th be 
scores 
oution 
id this 
ations 
n—1). 
n that 


tional 
ion is 


exact 
8) for 
duct- 
tions. 
doing 
btain 
ssible 

that 
Id be 
ather 


vides 





P. A. P. Moran 27 
Table 1. Probabilities that S>k for n = 8, 9, 10 











n=8 n=9 n=10 
k Exact Approx. Exact Approx. k Exact Approx. 
14 0-054 0-055 — _ 19 0-054 0-055 
16 0-031 0-030 0-060 0-060 21 0-036 0-036 
18 0-016 0-014 0-038 0-037 23 0-023 0-023 
20 — -- 0-022 0-021 25 0-014 0-013 
22 -— —- 0-012 0-011 
































implies that a test based on r, is as powerful as one based on r. If we take as our alternative 
hypothesis that the ranks are derived by ranking a sample of n pairs of observations (2;,¥;) 
from a bivariate normal distribution with p+0, the relative power of the three tests for 
large n can be gauged by their efficiency in estimating p, since we know their expectations 
and variance. For in this case we have (see e.g. Kendall (1949)) 


E(t) = = sin-t _ (Greiner) (3) 
vent) = Ra (1 * (- sin? p) + 2(n—2) [5- (- ein? ir) || | (Esscher) (4) 
E(r,) = aap io p+(n—2)sin- $p}, (Moran) (5) 
var (r,) = - {1 — 1-563465p? +...} + O(n-). (Kendall) (6) 


If n is large and p small we therefore have 
5 ir 
H(t)=— p, var (t)=>7> 


3 1 
H(r,) == Pp: var (4) =o» 


E(t) _ 3ntp _ E(r,) 





and therefore 


It therefore follows that, under these assumptions, ¢ and r, are asymptotically equally 
efficient in estimating p when p is small and equally powerful in testing the hypothesis that 
p = 0. This result is not surprising when we recall that in the universe of equiprobable 
permutations the correlation of ¢ and r, tends to unity very rapidly as n increases (Kendall 
et al. (1939)), so that for most rankings, ¢ is about 3 of r,. It has been recently shown (David, 
Kendall & Stuart, 1951) that ¢ and r, are also highly correlated when calculated from rankings 
derived from bivariate normal distributions: 

Hotelling and Pabst (1936) have shown that when used to provide an estimator of p, r, has 
an efficiency asymptotically equal to 97? = 0-91, so that it follows that the use of ¢ and 
r, involves a loss of about 9 %, of the information. K. Pearson also regarded 97~* as a measure 
of efficiency, but approached the problem from the point of view of ‘grades’. 








28 Partial and multiple rank correlation 


PARTIAL ¢ 


Given three rankings, which we denote by 1, 2 and 3, Kendall (1942) defines a partial ranking 
correlation coefficient t,, , in such a way that 
__he-hstes 

tes = (1—#,)?(1—#8,)*° (7) 
In fact, t,, , is the partial correlation between the sets of t-scores. Very little is known about 
the distribution of t,. ,. Hoeffding (1948) has shown that if 71, 7,3, T23 are defined for a con- 
tinuous population and the corresponding ¢’s obtained by ranking a sample from this 
population, then the distribution of n*(t,. ;—7,2.3) tends to normality with zero mean, and 
a variance which he evaluates, provided 7?,+1, 73,+1. If in addition 7,; = 7,3 = 0, the 
distribution of n#(t,. ,—712.3) is asymptotically the same as that of n#(t,.—74). 

It seems very difficult to prove anything about the distribution of t,. , for small n. We 
first notice that we must eliminate from our consideration all those sets of rankings for which 
ty; = 1 or t,3 = 1 for in such cases t,. , is undefined. In order to see what happens in practice 
I calculated (7) in the case where n = 4, ranking 3 is (1234) and rankings 1 and 2 have all 
permutations of (1234) except (1234) and (4321). There are (4! — 2)? of these, but the results 
are symmetrical in rankings 1 and 2 and the coefficient reverses its sign when either of the 
latter are reversed. It follows that only about }(4! — 2)? calculations are necessary. The result 
is shown in Table 2, which gives values of t,, , for positive values of t,, and ty3. 


Table 2. Values of ty. 5 




















Ranking1| 2134 1324 1243 2314 2143 3124 1342 1423 2314 2341 241) 
hs 0-6667 0-6667 0-6667 0-3333 0-3333 0-3333 0-3333 0-3333 0 0 0 
Ranking 2 tes 

2134 0-6667 1-0000 | —0-2000 | —0-2000 06325 03325 | —0-3162 | —0-3162 | —0-3162 0-4472 0-4472 0 
1324 | 0-6667 | -0-2000| 1-0000| —0-2000} —0-3162 | —0-3162 | —0-6325 | 0-325 | —0-3162 | 0-4472 | —0-4472 | -0. 
1243 0-6667 | —0-2000 | —0-2000 1 —0-3162 0-325 | —0-3162 | —O-3iu2 0-6325 | —0-4472 | --0-4472 0- 
2314 03333 0-6325 | —0-3162 | —0-3162 1 0-2500 0:2500 | —0-5000 | —0-5000 0-7071 0-7071 0 
2143 03333 0-6325 | —0-3162 0-6325 0-2500 1-0000 | —0-5000 | —0-5000 | —0-5000 0 0 0-7 
3124 0-3333 | —0-3162 0-6325 | —0-3162 0-2500 | —0-5000 1-0000 02500 | —0-5000 0-7071 0 —07 
0-3333 | —0-3162 0-6325 | —0-3162 | —0-5000 | —0-5000 02500 1-0000 0-2500 0 —0-7071 | —0-7 

1423 03333 | —0-3162 | —0-3162 0-6325 | —0-5000 | —0-5000 | —0-5000 02500 1-0000 | —0-7071 | —0-7071 0 
3214 0 0-4472 0-4472 | —U-4472 0-7071 0 0-7071 0 —0-7071 1-0000 0-3333 | —0: 
2341 0 0-4472 | —0-4472 | —0-4472 0-7071 0 0 —0-7071 | —0-7071 0-3333 1-0000 0: 
2413 0 0-4472 | —0-4472 0-4472 0-7071 | —0-7071 | —0-7071 0 —0-3333 03333 1 












































From this it is clear that, unlike the case of product moment partial correlation, E(t,. 3), 
when t,, and ¢,, are kept fixed, is not necessarily equal to zero. This is also true for other values 
of n. Suppose rankings | and 2 are each obtained from 3 by a single interchange of nearest 


neighbours, then t,, and ¢,, are each equal to 1 — 


(n—1)-*) or equal to 1 


that the expectation of the numerator of t,, , is not equal to zero. 


te ced 
n(n — 


1) 


, and t,,is equal to 1 (with probability 


Ses ]) (with probability (n—2)/(m—1)). It will then be found 


I have not been able to find a formula either for E(t, 5 | t)3,t23) or for var (t,..3) when 


t,3, tg3 are fixed or variable. Since ¢,, is approximately distributed as r based on 


9n(n — 1) 


2(2n + 5) 
pairs of observations, it is natural to suppose that t,. , is approximately distributed as 


+1 





anking 


(7) 


. about 
a con- 
m this 
n, and 
0, the 


n. We 
which 
ractice 
ave all 
results 
of the 
result 














(t12.3), 
values 


earest 


bility 


found 





P. A. P. Moran 29 


9n(n — 1) 
2(2n +5) 
ty..3 are possible when n = 4 and we therefore take the case where rankings 1 and 2 range 
over all possible permutations except (1234) and (4321). The frequency distribution of ty, , is 
then given in Table 3 which is obtained from Table 2 when completed by reflexion. As the 
distribution of t,. , is symmetric, the frequencies are only given for the positive values of 
ty2.3 together with the whole frequency for t,. , = 0. 





r based on pairs of observations. If we keep t,, and ¢,, fixed, only a few values of 


Table 3 





tis.8 0 | 02000 | 0-2500 | 0-3162 | 0-3333 | 06-4472 | 0-5000 | 0-6325 | 0-7071 | 1-0000 
Frequency | 24 6 8 18 6 18 12 12 18 11 









































The variance of this distribution is 0-282 = (3-55)-1. On the other hand 9n(n— 1)/2(2n + 5) 
for n = 4 is equal to 4-154 so that agreement is not very good. One source of discrepancy 
arises from the fact that 2(2n + 5)/9n(n — 1) is the variance of ¢ over all permutations and in 
the above we have omitted the rankings (1234) and (4321). If we make a correction for this, 
however, the discrepancy is increased. It is unfortunate that calculations of the above kind 
quickly become impracticable when n increases. n = 5, for example, would require approxi- 
mately 5* = 25 times as much calculation as n = 4. It therefore seems that no very clear 
conclusions about the distribution of partial ¢ can be deduced from the above discussion. 


MULTIPLE RANK CORRELATION 


It is, however, possible to make somewhat more progress with the theory of multiple rank 
correlation. Taking first a definition based on t-scores, we write 'R3,,) for the square of the 
multiple t-rank correlation coefficient between a ranking 1 and two rankings 2, 3 and we 
define it by the equation 1 —'Roy) = (1 — #3) (1 — #39, 5). (8) 


‘R2..5) is, in fact, the square of the product moment multiple correlation coefficient between 
the set of scores for the ranking 1 and the sets for rankings 2 and 3. 
Inserting the value of t,, , in terms of t,,, t;, and t,3, we find 
the + tis — Uthatestis 
(I= @) ” 
We now find the expectation of (9) when rankings 2 and 3 are kept fixed and ranking 
1 ranges, with equal probability, over all possible permutations of (1, ...,). In terms of this 
expectation we can then determine an approximate distribution for ‘R%.y. From (9) it 


follows that 2 
E(‘Ries)) = =a, A ltis — tes E(tyotys | te3)} 


_ 2 {2(2n+5) 

ae aa Ln to Blt | ta) (10) 
Take ranking 3 to be in the natural order 1, 2,...,n. Let a,;,5;; (i<j) be the scores for the 

rankings | and 2 relative to 3. Then a,,;,,; will be the score of 2 relative to 1. To evaluate the 

second term inside the bracket in (10) we have to firid 


2 


2 
t= aera) BZ outa 2 ah 


t “ae 
Ries) = 














30 Partial and multiple rank correlation 


when ranking 1 has equal probability for all its n! possible permutations and rankings 
2 and 3 are kept fixed. Then 


EX{(Za45b;;) (La yq)} = Lars by; + Lay sy bj; + Dag Aj by; + Da zj App dg; + Leys Ay gd g; + Dy; Ay, 43 
= Cy + Co tly t+Cyt+Cyt+Cg, say. 


In the right-hand side of the above expression, different letters used for suffixes are to be 
understood as denoting different integers. We now find each of these terms separately. 


Clearly c, = E(Zai,b;;) = Xb;;. 


To find the next four terms, consider the six possible permutations of three objects. By 
making a table of the corresponding scores we easily see that 


E(a,;4%) = E (4,343) = 4. 
E(a,ja;,) = E(aja,4) = — 3. 


Given i and j, there are n—i—1 possible values for k in the sum Da,,a,,6;; and therefore 
C,=4 > (n—i—1)6,,. Similarly we find 
i<j 


es tz (n —J) by;. 
(= ~32 (t— 1) b,;. 
C5 = +z (j — 2) bj. 
Finally it is obvious that c, = 0. Adding we get 
E{(Ya,;b;;) (Zay)} = tz b,(1 + 2j — 2%). 


In the above system of scoring we have only considered scores a;;,b,; for i<j. We now 
revert to Daniels’s original scoring system and write 


By =5; if i<j, 


= — by, if i>), 
= 0 if ¢=j 
Then 2 > by (j-t) = x X 4y(f-1) =-2>E =X By. 
i<j i=1j=1 i=1 j=1 
But > By = n+1—2p(i) 
i=1 


where p(i) is the rank of the i-th object in ranking 2. This is proved in Daniels (1944), p. 131. 
We therefore have n 


EE By = ¥ i{n+1-2p(i)} 
t=1 j=l i=1 


=2 {n(n +1)2- 3 inti} 


But the Spearman rank correlation coefficient between rankings 2 and 3 which we denote 
by 1,93 is n 
a ip(t) — n(n + 1)? 
wa 
‘ae Fen) 


and therefore DY 4s(j — 4) = §n(n*— 1) 1495. 


t<j 











emia 


a 





na -_— 2 ear ee 


refore 


> now 


131. 


note 











P. A. P. Moran 31 


We thus find c; = 3n(n— 1) {tog + 3(m + 1) 709} 


Mo 





and E(‘Riey) = “Tle Ml ibaa ath 1) tes Tga3}- (11) 


2 
9n(n —1)(1-t 
This is a somewhat surprising result since it involves both t., and 1,93. 
I have calculated the exact distribution of 'R?..) for n = 4, (3124) as ranking 2, and (1234) 
as ranking 3. In this case we have t,, = 0-3333, r., = 0-4 and the value of E(‘Ri,») given by 
(11) is 17/36. The observed distribution of ‘R2,,) is given in Table 4. 


Table 4. Distribution of ‘Ri2,) in set of 24 permutations 





Rites) 0 0-1667 0-3333 0-5 0-6667 1-0000 





Frequency 2 4 4 8 2 4 





























It is easy to check from this that H('!R3,.,)) = 43. It will be seen that this distribution is not 
so smoothly spread out as that of t,. , when all permutations are possible, but this is to 
be expected since there are only 4! cases to be considered. For larger n (how large it is 
difficult to say exactly) it is probably reasonable to test 'R%.,) as if it were the square of an 
ordinary multiple correlation whose expectation is equal to (il). For such a correlation 
coefficient we have (Kendall (1947), p. 382) 


-1 
E( Ries...) = aa > 


where p—1 is the number of independent variates and N is the size of the sample. Then 
(1 — R%o5))/R%o5) is distributed as Snedecor’s F with N —p and p—1 degrees of freedom. To 
obtain an approximation to the distribution of 'R?.,, we therefore put p—1 = 2 and define 
N* by ‘ 

E(‘R} ies) = Ne_-1 =k 


1 —'Ries) 


We then test 
Ro) 


as Snedecor’s F with N*—2 and 2 degrees of freedom. Unfortunately the case n = 4 is too 
rough for any useful comparison with this approximation. 

Multiple correlation based on Spearman’s r, can be discussed in the same way, but does not 
appear to have any advantages over the above method, for small n at least. However, when 
n increases, t becomes more arduous to calculate than r, and coefficients based on Spearman 
coefficients may be easier to use. The expectation of the square of the Spearman multiple 
correlation coefficient can be found by the same kind of argument as that used above. The 
reason why, in both cases, multiple correlation coefficients are easier to handle than partial 
ones, is that in the former case it is only the numerator which is a random variable and not 
both numerator and denominator. 











32 Partial and multiple rank correlation 


REFERENCES 


DaniEts, H. E. (1944). Biometrika, 33, 129. 

Davin, F. N. (1938). Tables of the Correlation Coefficient. Biometrika Office, London. 
Davin, 8. T., Kenpaxu, M. G. & Sruart, A. (1951). Biometrika, 38, 131. 
Hoerrpine, W. (1948). Ann. Math. Statist. 19, 293. 

Hore.une, H. & Passt, M. R. (1936). Ann. Math. Statist. 7, 29. 

KENDALL, M. G. et al. (1939). Biometrika, 30, 251. 

KENDALL, M. G. (1942). Biometrika, 32, 277. 


KEnpAtt, M. G. (1947). The Advanced Theory of Statistics, 1, 3rd ed. London: Charles Griffin and Co. 


KENDALL, M. G. (1948). Rank Correlation Methods. London: Charles Griffin and Co. 
KENDALL, M. G. (1949). Biometrika, 36, 177. 





~~ © = 


—~ ~ — ~~ -_ — 


| Co. 





[ 33 ] 


AN A?PLICATION OF THE DISTRIBUTION OF THE 
RANKING CONCORDANCE COEFFICIENT 


By A. STUART, Division of Research Techniques, London School of Economics 


1. Pitman (1938) considered the analysis of variance test, from which the ranking con- 
cordance test emerges as a special case, in the hypothetical population formed by the 
permutation of sample values. The nature of this approach precludes its being used to test 
any hypothesis other than that of independence. I propose to examine the distribution of the 
concordance coefficient W in samples from a ranking population in which concordance exists. 

In a recent paper, Daniels (1950) distinguished two types of population from which we 
may sample rankings. In his first type, we imagine the population of N individuals to be 
ranked from 1 to N according to each of m different qualities. A sample of n members is then 
drawn from it, and ranked from 1 to n for each quality in the same order as in the parent 
ranking. In Daniels’s second type, we take a sample of m rankings from a population, say MV, 
of rankings of 1 ton. Both types result in a sample of m rankings of 1 to n, but the generating 
processes and the sampling problems involved are essentially different. 

I am here concerned solely with the second type of sampling: the distribution of W is 
considered as a problem in sampling m members from a population of M, each member of 
which bears the numbers 1 to n in some order. The problem is thus one of sampling from an 
n-variate population under the restriction that for each member the variate values are 
a permutation of the first n natural numbers. The results will then be expressed in terms of 
parameters of the population, and not, as Pitman’s were, in terms of sample statistics. 


2. Idenote by k, the mean-statistic for any of the n variates. Where more than one variate 
is involved in the formulae, I write ko, ky9, etc., and denote by = the operation of summation 
over the n variates, permutation of subscripts always being permitted in summation. 

The coefficient of concordance may be defined as the ratio of the variance of the n means 
k, to the variance they would have if all rankings were identical, namely 7;(n?— 1). Since the 
mean of the n means k, for a given sample is always $(n+ 1), we have 


12 n+1\2 
© = saab") 
Since W is invariant under a change of origin, we may measure each k, from }$(n+ 1), and 
W becomes 12 


y n(n? — 1) 


We need only consider sampling fluctuations in X47, which will be called V. 
In deriving the moments of samples from a finite population, I use the method developed 
by Irwin & Kendall (1944). The finite population is itself regarded as a sample from a hypo- 
thetical infinite population, and the actual sample therefore becomes a two-stage sample 
from this infinite population. We denote by EZ, the operation of taking expectations over all 
possible samples from the finite population, and by Z, the operation of taking expectations 
over all possible finite populations of the given size from the infinite population. Thus 
E = E,E,, where £ has its usual connotation over all possible samples from the infinite 
Biometrika 38 


Dk. (1) 


3 











34 Application of the distribution of the ranking concordance coefficient 


population. Also, following Irwin & Kendall, we denote by K, the correspondent of the k, 
the rth sample k-statistic, calculated from the finite population values, and by x, the cumulant 
of the infinite population. 


3. With this notation, we have 


E,(V) = 2E,(kj) = DH, {(kj— Kj) + Kj}. (2) 
K. K. 
Now Bk) = “2403, BY KY) = “B44 
H B(i— K3) = x,(+-+ 
ence (KE 2) = Ke aaa : 


and thus, by the Irwin-Kendall principle, 
1 1 


M 
Thus, from (2), E,(V) = =k (- x) +Ky 
: 12 I ig , 


The somewhat unexpected appearance of terms in K? arises from the fact that the k,’s whose 
squares contribute to W have different parent means. The term XK? represents an inter-class 
effect, being, in fact, n times the variance of parent means. 

For an infinite population, M ->0o, K,->x«,, and 


a Fs Tika + zx] : (4) 


If, on the other hand, the population is finite, and all rankings are equi-frequent in it, all the 
K,’s are zero, our origin for each being at $(n + 1), and (3) reduces to 








TIE) ge (; '\ SK 5 
. a as nH) (5) 
. M (n?-1) 
Further, all K,’s are the same, and equal to | 7 hk Si and hence 
, M-m1 
BW) = Fo (6) 


This is not the result given by Pitman (1938) and Kendall! (1948), namely, 
] 
E(W) = ae 


They were, in fact, effectively considering sampling with replacement, and no correction for 
finiteness arose. Equation (6) implies that in the type of case we are considering, the mean 
value of Spearman’s coefficient r, in samples of two rankings from a finite population is not 
quite zero. Using the relation for r, between the 4m(m-— 1) pairs of values of a set of m 


rankings, namely, mW —1 
average r, = ———— 


and putting m = 2, we have r, = 2W-1, 





wk 


co 


1e k, 
ulant 


(2) 


(3) 


hose 
class 


(4) 


| the 


(5) 


(6) 


1 for 
ean 

not 
f m 








A. STUART 35 


whence, from (6), E,(r,) = ~w- (7) 
Both (6) and (7), of course, reduce to the more usual results when M tends to infinity. The 
correction for finiteness is generally negligible. 


4. It will be seen from (1) and (3) that 


wma gyz-[C-y)een]. 


We require the moments of W, that is to say, the expectation of powers of the right-hand 
side of (8). It will readily be appreciated that the presence of terms in K} makes the moments 
of W much more complicated than they would otherwise be. Furthermore, in taking 
expectations of powers of k,, we cannot take the mean of each k, to be zero, and the relations 
between sampling-moments and sampling cumulants for these powers will themselves also 
contain terms in K,. Either of these difficulties alone would make our task laborious: 
together, they imply that the sampling moments of W in the general case are altogether too 
long and cumbrous to be of much practical use. 

Henceforth, therefore, I consider only the case where the means of the population rankings 
are the same. This is not of course to say that we are dealing only with the case of equi- 
frequent rankings. The hypothesis of interest here is that in the population there are no 
differences between the mean rankings of the n objects; it specifies homogeneity of means 
but not homogeneity of variances. 


5. In the equal-mean case, £,(W) is given by (5). 
For the variance of V we have 


EA V*) = E,(Xki)? = BALK + Lokiykj,}. (9) 
Now by the ordinary relation between moments and cumulants, 
E( ki) = «(1*) + 3x2(1*) 


E,(k4) is that function of the K, whose expectation is H(k4). We know that 





E,(K4) = Ky, Pa 
2 ae > (10) 
E,(K3) = «(K3) +43 = =v i 
It is clear from (10) that 
M-1\3,,,[1 (M-1) 3 | te 
ig hy tea |X} = St ~ 
3 
It follows that E,(k4) = Gan i) eX mi a -(34 ) a | K,. (12) 


The process for deriving H,(k2,k2,) is similar but more complicated. We start from the 
ten ag = Kop + 2% + Kag Kop, 


which may be obtained by Kendall’s (1940) operative process. Applying this, we have 


1100 10 11) (9 0 
E(itelts) = «(4 01 1) + 242(5 1) +4 (5 oe(; ai a 


3-2 











36 Application of the distribution of the ranking concordance coefficient 
Using Rule 10 for the sampling cumulants of k-statistics (Kendall, 1943, p. 265), we find 


(1190 _ Kes ) 
0011) m* 











(5 i) = ou, , (14) 
im... Keo 
K(o 3 “| 
Substituting (14) in (13), we obtain 
2 
E(Uigk},) = “it + “Lis , Seon, (15) 
We have that E,( Ko) = Keg, 
_ Kee, M , , KaoXo2 | 
E,(Kj,) = «(Kj d+ = tHE {ute (16) 
2x2, KanX 
E,(Ky Ko2) = K( Kyo Ke) + Ko Ko2 = Str - | 


where the sampling cumulants have been evaluated by Kendall’s operative process. 


For the right-hand side of (15) we require the coefficient of x?, to be twice that of Kao Ko9. 
In (16) these coefficients are 





Ki KaoXoa 
‘M 1 
(2) E, (Ki 1) M-1 M-1 
2 
(6) — E,(K9Koe) Mui 1 


so we need only solve the equation 





M 2 1 
ae a 
which gives ie 


We therefore have 


3K. M+1 M+1 
E,{2K}, + Ko Koo} = TT +2 (ar) Ki (Goi) K29Ko2) 
whence the right-hand side of (15) is equal to 
M-1 1\ 3 ; 
E. (Grea) ae (2K + Kw Keal+ | 53( area) ammeXe |} (17) 
M M-l1\ 3 
and E,(kigk§,) = (arai) ae 3 (2KY 1+ Ka Kel + | 5- (Gri) ize | Kp. (18) 


6. If we now substitute the results (12) and (18) in (9), we obtain, on subtracting {Z,(V)}*, 


vary (V) = 2 {| 5— Srar (gran) | Ket +36 i) |*) 
+zall 5 courersth Bet] adcastle tora) ete aes au 


(19) 








If 


14) 


15) 


16) 


Koo . 


17) 


18) 





A. STUART 37 


2 
The variance of W is, of course, zai} times (19). If the population is infinite, (19) gives 


var(W) = feecertt {2 (Ss + 2x4) +22 (ss + axt,)| : | (20) 


If, in addition, all rankings are equally frequent in the population, we have (Stuart, 1950) 


op Dnet 
Ke = T3(" 1), (21) 
Ky= Te0(1 —n'), 
while it is easy to show that 
Vere 
Ky = qa(n+ 1), } (22) 
Keg = — xha(m + 1) (n +3) (2n—1). 
Substituting (21) and (22) in (20), we obtain for the infinite equi-frequent population 
a =e) 2 
var (W) = oftn 1)’ (23) 


which is Pitman’s (1938) result. 


7. It would evidently be a straightforward, although extremely laborious, task to obtain 
the higher moments of W for a finite population, but the expression for the variance (19) 
suggests that they would be too much complicated by the finite population factors to be of 
use. We therefore confine ourselves henceforth tc infinite populations. 

For the third moment of V, we have 


E(V —E(V)}8 = B{z(ig-“)}' 
and on expanding the bracket, substituting for the sampling moments in terms of sampling 
cumulants by means of the relations 
Me = Ket 15kgK, + 10K} + 15x3, 
Hag = Kyat KggXoq + 8K31 Ky, + 4K g9K 12 + OK g9K99 + BAG, + BK Gq Ko9 + 12k a9KH), 
Hoee = Keeq + K220Koo2 + K202o20 + Koz2* 200 + 44211 Kors + 44121 X101 + 4K112K 110 + 2K a10Xor2 ¢ (24) 


+ 2K 991 Ko21 + 2K 129102 + 4K G11 + K200K 020X002 + 2K 200011 + 2X20 Ki01 





2 
+292 j19 + 8K 119% 101X011 


(4 ANd fo. being obtained from uz, by Kendall’s operative process), we find, on evaluating 
the sampling cumulants as before, 





12 10x2 8x3 
B(V - EV = (+ arty tos) 





Ky , 8KgiKi1 , 4Kg0Ki2 , 4KagKo9 , KH,  8Ka9K) 
+955 (f+ mi + aa + at ign * Tae 





Koo , 12K 11 Ko11 , 8Ke10Xor2 , 4K | 8K 110K 101X011 
+222 (= + pe + a - — - mata” ial (25) 
As might be expected, the expression for the fourth moment of V is exceedingly complicated, 


involving sampling moments of the k,’s like 2g, g9, 44 #42» @NA My999- The formulae con- 
necting these moments with sampling cumulants would alone occupy a page of this journal. 








38 Application of the distribution of the ranking concordance coefficient 


I have therefore evaluated the fourth moment to order (m-‘), and omitting all details, give 


only the result: 12 
L(V) = por {5LK} + LU (Kobe + 8K aK oa kis + 6x4, + 20K39K5,) 


+ 2EZZ(K3 994511 + 8K 290110101 X01 + 8919701) 
+ ZEZX (KF 100X501 + 4110010100101 Xo011)}- (26) 


8. Each of the moments of V is a sum of products of cumulants of the same total order. 
We may, as an approximation, neglect terms below a certain order in m. Now the only 
terms in /,(V) involving cumulants of higher order than 2 are of lower order than (m-*), as 
can be seen from (25). Similarly, (20) shows that in the variance terms down to order (m-?) 
contain only second order cumulants, while the mean value of V is in any case dependent 
only on the second order cumulants. If we approximate by neglecting lower orders than those 
indicated, we know that if it were possible for the distributions of rankings allotted in the 
population for the n objects to be of the normal form, our approximations would in fact be 
exact expressions since all cumulants of order greater than two would then be zero. Thus if 
the distributions are unimodal, and not too far from normality, we may expect our approxi- 
mations to be fairly good for moderate sample sizes. We find, to the above orders of 
approximation, 


1 
9 (27) 
var(V) = hehe + DEK3,}, 
8 
and U3(V) = mi {2k} + SLU gg kK}, + LEV 19K 191 Kor} (28) 


We check (26) and (28) by substituting in them the values (21) and (22) for the equi-frequent 
population. We find 


8 
H;(W) = mn — 1) (29) 
12(n +3) 


which agree with Pitman’s results to the same orders in m. 


9. We can use the expressions (26)—(28) to test the hypothesis that the n objects have 
equal mean ranks }(n + 1) in the population, and, if this hypothesis is not discredited, to test 
further hypotheses about the second-order cumulants of the population, its variances and 
covariances. We may, for example, wish to test whether the rank allotted to one object 
influences the rank allotted to another, i.e. whether their covariance is zero; or we may wish 
to test whether the variance of one ranking is greater than that of another. In either case, 
we can insert hypothetical values for the characteristics under test into the moments of W, 
and estimate the remaining cumulants by the k-statistics of the sample. We can then fit 
a Pearson frequency-curve by the methods of moments, find significance points by some 
suitable transformation, and see whether the observed value falls outside : aese points. If so, 
the hypothesis is rejected. 

We must, in all cases, first test whether the means can have been equal in the population. 
If this is not so, the distribution of W can be of no further practical use to us. Even in this 
limited case of the distribution, the computation involved is heavy enough, as will be seen in 
a later illustrative example. 








- @& 24 ==— 


give 








A. STUART 39 


If we wish to test only the narrower equi-frequent hypothesis, we can use Friedman’s 
(1937) result that m(n— 1) W is distributed like y? with (n — 1) degrees of freedom. Of course, 
if this is not rejected, the wider equal-mean hypothesis must also stand, while if it is over- 
whelmingly rejected, the latter is also likely to be rejected. In view of the fairly heavy 
computation involved, it is worth making the simple equi-frequent test first to avoid 
unnecessary computation. 


10. We consider now the practical problem which gave rise to the preceding theory. 
A recent inquiry by the Social Research Division of the London School of Economics, which 
is reported by Hall & Caradog Jones (1950), was concerned with ideas of social stratification. 
A sample of approximately 1000 people were asked to rank 30 selected occupations in terms 
of ‘social class’ which was not defined. Effectively, the mean rank allotted to each occupation 
was calculated, and the problem then arose: How far were the differences between the sample 
mean ranks significant of real differences in the population? 

It is clear that, as above, we can treat this as an ordinary case of sampling from a population 
of rankers. We, therefore, have at once that, for any object, the sample mean rank is an 
unbiased estimator of the population mean, and that its variance is given by 


var k, = ———-3. (31) 


We have also that the covariance of any two object means is a similar function of the 
covariance of the ranks allotted to these objects in the population, i.e. 
M-mky, 

M-1\m’ 
Where the number of oojects being ranked is as large as here, we expect the difference 
between two mean ranks to be distributed approximately normally by a central limit effect, 
with variance obtained from (31) and (32) as 
M-m 1 
var (kyo — ko) = : 
For an infinite population we obtain from this, on substitution of sample estimators, 


(32) 





cov (ky, oko) = 


(Koo + Kog — 2X41). (33) 


1 
var (ky — ko) = om (kag + Kog — 2h). (34) 


For n = 2, this reduces to the binomial test, as it should. Several comments need to be made 
about this test, quite apart from the emphasis on its approximate character. First, we can 


only test means in pairs, and (:) may be a large number. We can avoid most of the 


labour in practice, for after a few obvious tests have been carried out, the eye is often enough 
to test widely separated means. Secondly, the terms in (34) will in general be different for 
each pair of objects: we cannot automatically conclude, if | kyo9—o19| is significant and 
| kxo0 — Foor | > | 100 — Foro |, that | 199 — oo, | will also be significant. Nevertheless, this is 
very generally true. Thirdly, as always where non-independent tests are involved, we cannot 
directly compound probabilities, but must rely to a certain extent on common-sense 
judgement. 

11. We chose a non-random sample of 50 rankings from the data obtained in the original 


inquiry. Further, to reduce computation, only eight occupations are considered, their 
original rankings having been renumbered from | to 8. This procedure is permissible, since 








40 Application of the distribution of the ranking concordance coefficient 


we require the data for illustrative purposes only. We thus have a sample of 50 rankings of 
eight occupations, these being 


A: Bricklayer 
B: Chartered Accountant 


C: Chef 


D: Civil Servant 


E: Commercial Traveller 
F: Elementary School-teacher 


G: Farmer 
H: Tractor Driver 


We first wish to test whether this sample could have arisen from a population in which all 
occupations had the same mean rank. The Friedman 4? test is so highly significant that we 
should not normally carry out the equal-mean test, but we proceed in this case as an illustration. 


12. From the 50 rankings, we find the eight k,, eight k, and twenty-eight k,, to be: 















































Table 1 
0 ky 
ccupa- ' 
tion ky ke 
A B C D E F G H 

A 7-02 | 0-8771 | — | —1-072 | —0-1304 | —0-126 | —0-4828 | +0-0172 | +0-1792 | —0-2096 
B 1-36 | 0-4392 | — —_— +0-1128 | —0-268 | +0-2296 | —0-0304 | —0-4744 | —0-1928 
C 5-52 | 1-4792 | — — — —0°356 | —0-1328 | —0-7728 | —0-0208 | —0-1496 
D 2-30 | 0-7857 | — -— — co +0-118 + 0-158 — 0-372 + 0-076 
E 5°14 | 1-3882 | — — — — — —0-1596 | —0-7456 | —0-1872 
F 4:14 | 1-3473 | — “= — — — — —0-5456 | +0-0128 
G 3-04 | 1-6310 | — — — — — — — +0-0808 
H 7-48 | 0-5812 | — — — aon? aime naa saak Sidhe 











From these values we calculate estimates of the quantities involved in equations (26)—(28) 


as follows: yy, = 85290, LUT ohio: Kou = 0-8310, 
rk = 11-2559, LULA key = 39-1475, 
=e = 15-1754, LIT kyokior kor. = 1°4599, 
=i = 21-4386, LITK,y k2y, = 1/3824, 
=rKi, = 2-6224, TELE 99 Koon = 15-1979, 
TDkegk?, = 6-6862, SELL Ii 00 k:o10 “e101 Koorr = 0°4314. 
UTK ka, = 62-2355, 

UTkyo kogk?, = 81836, 
=rK, = 1-8224, 
DUK, k, = 9-9432, 


We then find, for our estimates of the moments of V, 
E(V) = 0-17058, 

var (V) = 0-01110, 
H3(V) = 0-00231, 
M4(V) = 0-00111, 


giving 


B, = 38950, 
By = 9-0334. 








ti 


3 of 


all 





A. STUART 41 


The criterion x (Elderton, 1938, chart facing p. 51) is found to be greater than unity, and 
2, —3f,—6 = 0-3818. 
For a Type III curve, this quantity should be zero, so bearing in mind that we have only 


estimated the moments, a Type III curve should give a reasonably good fit. V is therefore 
distributed in the form, with origin at its mean, 


ya 
dP (: + 4) ew dV. 


The substitution (1 + 4) = x shows that 2yAz is distributed like x? with 2(ya+ 1) degrees 
of freedom. 
From Elderton’s formulae, we find 


yA = ya+1 = 1-02696, 
y = 9:61039, 
A = 0-10686. 
Writing (0-17058+ V) for V to bring the origin of measurement to zero, we see that 
19-22078(0-27744 + V) may be tested in the y? distribution with 2-05 degrees of freedom. 
2 
The observed value of V = & (i, "5 is 33-6416. 
The resulting value of x”, which is over 600, is extremely significant. We conclude that the 
sample could not have arisen from a population with equal mean ranks. 


13. Having established the heterogeneity of the means taken all together, we can use (34) 
to test their homogeneity in groups. 

We first consider the group BDG, with mean ranks of 1-36, 2:30 and 3-04 in our sample. 
From Table 1, we find: 








: Difference Standard 
Objects 
of means error 
BD 0-94 0-19 
DG 0-74 0-25 
BG 1-68 0-25 

















All three differences are significant at the 1 % level, and the group is therefore regarded as 
heterogeneous. 

For the group ECAH, with mean ranks 5-14, 5-52, 7-02 and 7-48, we see at once that all 
non-adjacent pairs are significantly different. The adjacent pairs give the table: 











Objects Difference Standard 
of means error 
EC 0-38 0-25 
CA 1-10 0-23 
AH 0-46 0-20 

















EC is not significant, CA is highly significant, and AH is roughly significant at the 1 % 


level. Thus of the six pairs, four differences are highly significant, and the group is adjudged 
heterogeneous. 











42 Application of the distribution of the ranking concordance coefficient 


14. Summary. A test is proposed to decide whether a set of m rankings of n objects can 
probably have arisen from a population in which the mean rankings of the objects were 


equal. A second, more approximate test, can then be used to examine the mean ranks of the 
objects in pairs and larger groups. 


I am much indebted to Prof. M. G. Kendall for criticism and advice during the preparation 
of this paper. 


REFERENCES 


DantE1s, H. E. (1950). Rank correlation and population models. J. R. Statist. Soc. B, 12, 171. 

ELDERTON, W. P. (1938). Frequency Curves and Correlation, 3rd ed. Cambridge University Press. 

FRIEDMAN, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis 
of variance. J. Amer: Statist. Ass. 32, 675. 

Hatt, J. & Carapoc Jones, D. (1950). Social grading of occupations. Brit. J. Sociology, 1, 31. _ 

Irwin, J. O. & Kenpatt, M. G. (1944). Sampling moments of moments for a finite population. 
Ann. Eugen., Lond., 12, 138. 

Kenpatt, M. G. (1940). The derivation of multivariate sampling formulae from univariate formulae by 
symbolic operation. Ann. Eugen., Lond., 10, 392. 

KENDALL, M. G. (1943). The 4«ccanced Theory of Statistics, 1. London: Charles Griffin and. Co. 

KENDALL, M. G. (1948). Rank Correlation Methods. London: Charles Griffin and Co. 

Prrman, E. J. G. (1938). The analysis of variance test. Biometrika, 29, 332. 

Stuart, A. (1950). The cumulants of the first n natural numbers. Biometrika, 37, 446. 





a hCUrPhhOlC ZAC elle 


tion. 


e by 





[ 43 ] 


THE EFFECT OF NON-NORMALITY ON THE POWER FUNCTION 
OF THE F-TEST IN THE ANALYSIS OF VARIANCE 


By F. N. DAVID anv N. L. JOHNSON 


1. The effect of departure from normality in the distribution of the residual, or error 
term, on the significance levels of z (or F’) used in the analysis of variance has been studied 
for the simple case of a one-way classification by Pearson (1931), Geary (1947) and lately 
by Gayen (1950). The power function of the F-test, based on the distribution of ” when the 
null hypothesis of equal means is not true, was considered by Tang (1938). More recently 
Patnaik (1949) developed an approximate method of obtaining the power function of F based 
on a x” approximation to the non-central x? distribution. Both Tang and Patnaik assumed 
that the error terms were normally distributed with a common variance from group to 
group. It has seemed of interest to us to attack the problem in a more general way by 
studying the distribution of F when the expectations, the variances, and also the higher 
cumulants of the distributions of the error terms may vary from group to group. Our results 
cover as a special (simplified) case the interesting problem of the power function in the 
analysis of variance when there are unequal ‘within group’ standard deviations. This has 
been studied, for normal variation, by Quensel (1947). In this paper we give our theoretical 
approach and results, and some slight illustration. The main numerical results are reserved 
for a further publication. 


2. While giving some indication in §4 of how the theory may be extended, we confine 
ourselves here to the one-way classification. If it is supposed that there are available s groups 
containing 7, %, ...,2, observations respectively, where 


n= N, 
t=1 
then the one-way classification leads to the following identity: 
8 " _ 8 7) os 8 ™ ra 
x m%.-2,. 27+ DD H-4)?= DD (i —Z.)*. 
t=1 t=1 i=1 t=1i=1 


It is usual in analysis of variance tests to assume that all the variables z,, have the same 
normal distribution. We shall generalize this by assuming that the n, variables 
2h» (¢= ES ee (i= 1, 2, ...5 %) 


each have the same distribution which we may specify by as many cumulants Ky, Ky ..., Ky ++» 
as desired, but that the values of these cumulants may be different for different groups. It 
will be noticed we are assuming that if the «,, are not equal, (i.e. if the null hypothesis does 
not hold), systematic rather than random effects are present so that the x,, are treated as 
parameters. For brevity we shall write 


S,= > m(%,—2,)?, S,= » b> (4 —-%.)? 
t=1 teliel 


and fi=s-l, fp=N-s 
for the degrees of freedom of S, and S, respectively. 











44 The effect of non-normality on the power function of the F-test 


Ideally we should investigate the ratio 


F = (8,/f,)/(Solfe) 
but it is dificult to obtain its distribution for other than the normal case, and its moments 
can usually only be obtained as approximations. In the course of our work we shall indicate 
how approximations to the moments of the distribution of z (or F’) may be obtained, but for 
the main part we shall rewrite the ratio in order to work with exact quantities. 


3. Let F, be the value of F at the normal probability level «. If the usual null normal 





hypothesis is true then P{(S,/f,)/(Selfa) > F,} = & 
or P {(s, - fs s,) > o| =a 
or writing -F Fi,=a 

2 


where a is a constant for given /,, f, and a, we have 

P{S,+a8, > 0} = a. 
This suggests that we should consider the distribution of S,+aS, in the non-normal case. 
The actual distribution of this quantity is difficult if not impossible to obtain, but the exact 
moments of the distribution are a matter of simple, if laborious, algebra. We have found these 
moments and our procedure will be to fit a Pearson (or some other suitable) curve to these 
moments. We shall have therefore, for each f, and f,, to fit a new curve for each separate 


value chosen for a, but, provided a suitable system of curves is chosen for graduation purposes, 
the labour involved is not heavy. 


4. We might perhaps note here that the algebraic results obtained in this present problem 
of the one-way classification, and the method used, may be applied to other more complicated 


analysis of variance problems. For example, if we consider a second-order hierarchical 
classification we may write 


8 " 1 8 ™m - = s ™| NK os ss ™m NH ~ 
XH (4-2, P+ DY ml%_-2. 7+ DUD g—-%e P= DV XD DX uy—2...)* 
t=1 t=1i=1 t=1i=1j=1 t=1i=1j=1 


™ 
where %, == > Ny. 


The second expression, the ‘between subgroups within groups’ sum of squares can be 

considered as a sum of statistics like S,, independent, and therefore with additive cumulants. 

The third expression, the ‘residual’ sum of squares, can be considered as a sum of independent 

statistics like S,. Similarly the cross-products can be obtained from the cross-products of 

S, and S, although care will be necessary to ensure the enumeration of all the terms involved. 
5. It will only be necessary to outline the derivation of the moments of 


briefly. We require 


MS") = E[S—E&(S)¥ = E{[S, — &(S,)] + a[S, — &(S,)]¥ 
= 6S, —&(S,)]! + "C,aé[S, — &(S,))* [S, — &(S)] + ... + a"E[S, — F(S,)]". 
For condensation we shall use the customary notation 


H(S$S$) = & {[S, — &(S8,)}° (8, — &(S,)}*} 





ents 
cate 
t for 


‘mal 


| be 
nts. 
lent 
s of 
ved. 





F. N. Davin anp N. L. Jonnson 45 


and use x(S?S$) for the corresponding cumulant. It will only. be necessary to use cumulants 
for b+c> 4, but for these cases the expressions for the cumulants are much shorter than those 
for the corresponding moments. Partly because of the weight of the algebra and partly 
because the individual values may be used in other ways, as we indicated in the previous 
section, we have evaluated the moments of S,, 8, and their joint moments separately. We 
begin by a transformation of variables. We have assumed that 


oa + 1 8 mM l 8 
E(%j) = Ky = 6(%), E(2,) = v2, 2 6 (24) = VW MXu = 1 (say) 
be) fmt t=1 


where NV is the total number of observations. Let 
1 8 
yg = %y—Ky Ky—K, = C, Wet)? =o, > mC, = 0. 


Substitution into S, gives 


8 1™ 2 l eo ™ 2 8s ™% 
S,= & n(7 >> m4) “WN >» 3 a) +No?+2> ¥ Gx, 
t=1 Ni=1 =li=1 t=1i=1 


or defining a k-statistic for each group, i.e. writing 


1 ™ 
ky =— D% 
%Gi=1 


8 2 
we have S,= z n,k2, — (=u) +2 > mkyC,+ Nor. 
t=1 t=1 t=1 
4 n, 
and &(8,) = 3 ky(1- 3) + Not. 
t=1 N, 


The first quantity with which we have to work is therefore 


: iw 2 8 8 
8,—&(8,) = E mt (Sku) +2 mkyC,— & Ky (1-3) ; 
t=1 i=1 t=1 Pes 
Similarly din z zy (%4—%, )*, 


whence if we define a second k-statistic 





— & (ta-%)* 
=1 


Ieay = 


we have 
8 8 8 


S,= 2 (m—1)ky, @(S,) = ¥ (m—-1)kKy, S,—E€(S,) = u (m— 1) (Ky, — Ky). 


=1 t=1 t= 


6. Moments and product moments are obtained by multiplying out the appropriate powers 
of the two quantities S, and S, and taking expectations, a procedure which is made easy by 
the use of David & Kendall’s tables of symmetric functions (1949). We give the results below: 

™% 


8S) = Sku(t—F)+ N08 (8) = F Om New 


o3)~ Ent-esnle-g)] ome 


8 
+4 ¥ ky, CP, 
t=1 











46 The effect of non-normality on the power function of the F-test 





-1 
MSS) = 3 ™—*(1— Ft) cut 2B (m— 1) Chk 


~f 
ao S80, NT +2 aa [1 —atsn(ot—2) | +3,( ac) 
Lo( )= 5% -a+n,(a—5) + 2, —ar+n,}a WN Wz a" b 
8 8 
+43 euGi[1-a+m(a—5) | +4 3 mah 
t=1 t=1 
It is of interest to note that if we put C, = 0,x, = k,,r = 1,2,..., we get 


ms) = «(E 2-Fe — 


mS = 3B (my— ue (Sts SE), 





HS, 8.) = K,(s-14+ 5-3 a 


t=1 
8 


u(S3) = al - - i 


t= 1% 
results which were first ve 4 Pearson (1931), 


% 





1 
24/2 8 2 8 2 6/8 8 
+ 78 (3 mK) (= mnt) “9 3 (= Ku) —W 9) Kt (3 mx) 
5 8 8 K n,\2 8 / 3n, 8 
+ Wa (= mXa) | + 62 C; (1 -W +24 2% Cea n (2-3) + Po 14 Ky CP 


Fn) (eee) -3 Src) (1-9) 


23 CPey(1—H) +24 mCP —Fe(E mCiky) 





u(StS,) = Leal Hennecke — Ft) +y2(E Ou 4a) (5 me) 


850-8) 4-4 (Bone) [S(-2)] 
a (n,— nent ( -%) ~ sy (n,— 1) CK ka — 5 re (n,— 1)ky) (3 Cex) 
45 (m-)CPKy 





(n,— 1) n S, (m— 1) my §, (n,— 1) n, 
(SS?) = EOS (iA)ead “ Kaka (1-H) +4 3 Dg (1%) 


J N 
l ~ me 
+25 OT Ue — (3, 4-10) $2 Ging 8S (m— NG Kak 


t=1 t 


m(S}) = 5 tle 5 I gta OD ae 2 K+ 8 3 (m1) ad, 
t=1 |= 1% t=1 1% t 





Fro 


Co 





F. N. Davip anv N. L. JoHNSON 47 
From the above we find the third moment of S to be 


(8S) = , la [1-a+m(a-%) | +12 5 “e8s] 1 —a+n( -7) | [1- — ot +m| (*-x)| 
+ & Hol 1-a+m( -x)| + (4— 12a + 808) — 12m,(55- at 5 +08) 
vali ¥ ordered 
“Blgllin-eo- 3) Bld) 

— {3 xa 1-2+m( -x) |) +63 0/8 #[1-a+n(a-5) | 
+ 24 >  Cikuka(2+0+0*— 57) mat mCvey){ 3: Ky [:- -a+m(a-5) || 


ildne) (Seege[i-ernle- dl] 


8 2 
+24 ¥ n,CiKz,- WV wan) " 


t=1 
wee the procedure outlined we find that 
x(St) = “(S}) — 34(S}) 


“3 Bi) + Faenew) [EA (1B) ] +208 S5*(1-3) (1-7) 
aye (13) (1 yt ye) —y ew [E19 | 
-4oowa[2-9)¢-3)oms $3) Fe 4) 
+ Sexes (1— 2) + 5F5[ 2 (1-H) eu] + ya Brera 
~ $8 xpea)*[ Deu (2-3) ]+ ya Ceven) | Exurn(3— FF) | 
+493 Sth (9-4 + $5 comet) | Dku(1-¥) | 

4 5 (mca) ( Bea)? —F Bra) (Dh) + a (Dt 

~ 122 (Seyea) | Seu(1-H) [+ ye Bmar) | Eeu(3—Fy) | +484 


192 96 
a1 Smyth OE (Same) (Erkan) — aya (MAR) (Era) + aya Ema 

















48 The effect of non-normality on the power function of the F-test 
+ al Emvea)*+ 850.08 (1H) — 55 (Sma)| ECeu (1—") 
-asi(-3) (0-9) -eoneno[z%e(-9)] 
$m [ (3) [nate 
— HBG R«) | Ea (1-3) | +55 ComCina) [ 2s (3-5) | 
— Fs (Eka) (Simpy) + 228 (Sime) (Sm Opkaeky) +e (mk) (EC ey) 
2) 192 


1) tye (Zeek) (my Cou) | Sew ( -*)] 


— = (Em CK) fe (2— 3) [=r cont [ Bxa(1- ») 


+ ri (1 - R) +3 Ww 5 (EmCky) (Lm Ky) + BLOF Ky Ky ( -3) 


-> > (Sm Cry) [ BOK (i - x) | + 965,01«4,( -F) 


—y mC) | Zeal — Ft) | + 192mm 08d + 8? (my) (mC) 
a = (Xn, C, x3) (X,C; Ky) + 32D CP ky (1 ~ a) + 1925), CP ky Koy 


— FF (EmCpku) (mC y) + 16S Oh ey, 
k(S}S,) = u(S3 S,) — 3x(S, S,) «(S?) 


77 Z(— 1) eal ON ince, [ Eu (3- 57) | 
H(e% a) [an 3) mea (-9)(-%) 
+ xq (D(my- 1) ky) [ Be (1 -3) | + 4D Ky ke 4" : (i -3) 


2 
+ ya (Dm 4) [Dl — 1) ey] 24 (Smeg) [Elm 1) ey] 














12 
+ 7a ( 














8 _— 
7” (i, Kz) [ZX (m4, — 1) KyKa) + 24D K3 Ky < ! (2 a wh) 


(Elm 1) Kaka] [ 2a (1 - R) | +5 +55 (Em Ky) [S(m%— 1) Ky] [ 2xu( * x) | 


~ HBV) [ Saat (2-32) | + PA cm — 1 08) Cm) 





) 





F. N. Davip AnD N. L. JoHNSON 49 


-1 3 
+05 Ok (-3) +“uy— tT Oy Ky (2-9) 





— Fy EmCiey) | Ey 3 + pal Bl 1) Coe] Ome 
beck, ares “F)-¥ ENG [ 2«u(1-5) | 
nae -9)] Honea zoe 9] 


— Ll — 1) Kal | Ce (1- FP) ] + 4800-1 Gat 





+ 5 (De) (Em) [504 — 1) kad — $2 (340,44) [D0 — 1) ka 


_& soe 1) KgyK oq] (1,0, Ky) 





1Ka(1- +) $485 (m%— 1) Of eyky— SOLE — 1) Gal (EOrKw) 


+243(n,— Hie’ [Xi(1% — 1) Kg] (Ly CZ Kgs) 


+82 (m— 1) CPx, 
k(S{S9) = w(SZS9) — 2x2(8, 8.) — x(S}) «(S}) 


= ya ? se('-3) + 4D Key Ko4(™%4 — n+ — x) tyi(E ee) (7%) 


cg 3) Poe) Hes [ol] 
me. wlalu— 1) ky) [ 2s (1 -3) (=) | 

+ aye OY 3 (1—)"+ om — 0 (1-3) ] + gal Bl Daal 

+ 16D K4,K3, a (oe) + Fg Da) DK aka ty 1)] + LOE Kk (2-5) 
AF — Fp LEC4— 1 ewKa] | Emeu(1- 5) | 


+ (DM Kn) [Em — 1) 4] + 5 (Samy) [4 — 1) kal? 














+ 85K3, Ke, —— 


1) 
+45 KO ni <a (1-74) + 16x eaC 


—W v(z S- = Ks) (XC, Ky) + 48D Ky ky Oy 











“(1 ~%) + 8Ekuka Cy 
o(- 1)? 











5 —*(1 -5) + 16DkykyC; 


— Fy LEC = 1) Ope] [Ely 1) ky] + 82E yO KM 1) 





° —1])2 
— (El 1) keg] (EC) + 4E CH eg 


+ 16LC}(m%4— 1) Kyky + IGE CIR (n,— 1), 
Biometrika 38 ; 4 











50 The effect of non-normality on the power function of the F-test 
«(S,S3) = (8,83) — 3x(S, S.) x(S3) 


a _1)2 1)3 
=> ky A (1 — Ft) + 12d (.- — Ft) + 6K 3 














np 
$+ BAe A) (1 FH) — 5 (og) 0-1) a 
+45 TD (1- R) +245 ee (1 — Ft) + 243K nae 


+ ABS) (1-H) — FF Sal 1D) LSeal— V) 











‘ _1)2 —1)(5n,—7 
+25 “ Y Crk + 24 am Y CKeyk ns + 8Eky ky Cot Nene 
+ 4850;kyk4(m— 1), 


K(S$) = (S$) — 3x%(S3) 


= Be Gs meee AS 4 32 E yA 


ee 


+8DK3;, — (4n? — 9n, +6) + 144343, UY” = 


+ 96DK3, Ky a as Pa a 485x4,(n,— 1). 
t 








The expression for the fourth moment of S may be reached by converting the cumulants 
given above into moments and by combining these fourth order moments as previously 
described. The expression, as will be realized, is one of great length, and for this reason we do 
not give it here. It is considered more important to give the individual expressions as these 
will be of use in investigations other than that with which we are concerned. 


7. The moments as given above are quite general, the only assumption which is made 
being that the cumuiants of the distribution of x in each group do exist and are the same for 
each individual of the group. We now proceed to make certain simplifying assumptions which 
will enable different types of distributions to be studied which are useful in statistical theory. 


We first assume Ky=«k, for ¢=1,2,...,8, r>2, 


or in other words we assume that each x has the same distribution for each group. Under 


this simplifying assumption we find the second-order moments as given in the preceding 
section and, remembering n,C, = 0, we have also 


l 4 
MS) = S15,—8(S)P = me (Exp— HUQ tye) + eema(B2-F+ yl) 
5 38% 68s 4 C, 
+26§(B5-57 - Ft +H) +88- 1) + 6x, ( 5 2G) 


+ 483K, OC, + 12K, 50? (1 -%) + 24x} Yn, C} + 8x,dn, CF, 


mStS) = xe -E5, +E—-(14+5)— - (245 xt x |t 4s (1+5)-1-22 -| 


+408 (- Eq) + Aus| 04(14 5) - ZS] - Beye BO+ dea Zl— 1) CF 





re 





F. N. Davin anv N. L. JoHNSON 51 
1 1 2 1 1 
(8,83) = | Day (2+ 5) Ea +s(1+5)—1]+4ees[ o(2+5)— 1-25 | 
+ 2K§ [ 2*(1+ 5) -7-Bq—2 | +2 -2EG ¥ 2) — Bear Cp 
N] N nN 
1 1 1 2 
H(S3) = ko(— Sgt BEL — B+ N) + 12 Hy 20+) + 408 (E = 304) 


+ 8x3(N —s). 


When n, = n, that is when the numbers are the same in all groups, then 








| ae <2 = cc 
WSR) = Ke EGE + 12k — + 4g VE) 5 ange —1) 


a 
+ 12«4-— DCF} + 24K3nD CF + 8x,nDCF, 


p(SES,) = rE) bageg@@— DE 4 ae gin—1) ECF 


on nD aay (n—1)(s nen —1) og 2@—DE-) 


s(n—1)8 s(n —1)(n—2) 
n? n 


MS, S3) = ke 





_1)2 
M(S3) = Kg ote 1) 


8. The cumulants and product cumulants of the fourth order follow similarly. We omit 
x(S4) because of its great length. The other four are of reasonable proportions and we have 


k(S3.8,) = a ( —t) + 12K : (1 -3) (--3) 
+ Ky] 26E 55 (m1) -5y (2 s—-> x) +p 8) +7 ‘ (N- — 8) (3s —5) 
-% 6-1)(s-25)-FW-9 E> ~(1-%) | 
+ iad 2 *( -3) (1-3) + +yalV-s)6-1)| 
+ 2dxeadl ™—( - Fi) +R av-a| 


+ 24g] B( -3)-¥ = (N —s) (28— 2) | 


+ 6K, [som (1 _ x) | + 24k; Ke zo) (2 a ae -¥ 


+ 2dxyn| EC m—( -3)]- dS EO + 12045 *o4(1-%) 














+ 4 Ey, —1) OF + 24n§ (55 Em{CP— BCI) + Bx, E(u 1) Ch, 











52 The effect of non-normality on the power function of the F-test 
, (n,— 1)? n m—-1 Fe he -- 8 iot-U 
(SIS) = EO (1-H) + 4eoma| B+ (2) tHE ‘ 


+4k5 Ks [= : (1 -5) (1 ss - 


28 ae ae ee *) | 


sala [oH oof) 


+ 16] aa - 2) io 
t 


+803 (2) +e eee 














U7) 


+ 4k, [2a -%)| + 8k5Ke ial (1- *) + XC, a = or 


+ 16K,K [ sx0%— ( -*t) + =m 2 4%, ‘xa — 32x,K} DC, 
my N ny V 











_ 2 
+ 4K, 50} 5 + 164, SC} — 1) + 163 SCH(n,— 1), 








«(S, 83) = i a ~ (1-H) +12 BO the ‘(1 — Ft) + 25K 


x[sxeo oP, gp =D ( — 8) -5 (2) ae Kanal | 


oan Dt (1-4) | + 2a ( -%) 
+ 24K2K, [eee pent }(1 . v4 


























+2K7> ie + C+ 24K5Ka>, a C;,+ on ee 
ee. 1), 
x(84) = KER n? oe + 24k¢k, Pe apn + BKK > — 2) (m— 1)? 


ni 





= — (m— 1)? 
| (4n} 9n, + 6) + 144K 43D 
®, 


+ TS Dh 


+ see 3 21=1) + 48x4(N —s). 
t 


9. These expressions Wace still further if we take n, = n for ¢ = 1, 2,...,8, and we have 


k(S} = SF 2A Keke nbs ets +32k;K oa) 5 cg SD [ae 1)? (0-2) 





+ 144K,K3 qe <1) + 960K, bowen) 





+ 48x4(s — 1) 


eo 2) 


+(¥C?) ote =2 - + 288K, Pe —— Ss 96«3 ——— + 19204) 


+ (SC?) (s2x,°—) = + 192k,x,n »: + 16nx,>C}, 





K(S 


ir 


ao oo 


— a. a Oe Oe UE. ClO 


K al] 


have 





F. N. Davip anp N. L. JOHNSON 53 


x(S{S,) = g, See, ide 6 OO we 


(s — I) (8 —2) (n—1) 
n3s2 5X3 


ns n2s 





12x (n—1)(s—1) 
+s 1) (s— 1+ 24g @— I E—*) 


(n—1)(s—1) 


+ 12k, 
ns 


DCF + 48K Ko(m — 1) LCF + 8x, (n — 1) DCF, 


r(SPSP) = eg AAO eg, OH MMO AN 5 194, ONO 




















we n*s "2 ns 
+g ei 1) + 2(s—1)] + recyeg 2) 
+ 16K2k, es en N) + 4k, e—Iy DCF + 16K, K_(n — 1) SCF + 16x3(n — 1) HC?, 

— Sr + 12kgks a + 32K Ky a 
— 8k 5K i a (n— _ 1) _ gq @— D-H) 
+ 24K yk} Se 8 + 482k, (n— 1) (6-1) : 
n n 
rt) = gO 5 tg PY 4 a0 cg, APIO IF 


= vt - - 
+ 805) facn— 1)8— (m= 2)] + cy AP 4 96 cG 1 DD) 


+ 48x4s(n — 1). 


i0. Aknowledge of the first four cumulants of a frequency distribution does not necessarily 
imply a knowledge of the distribution. We propose to estimate 


P{(S8,+a8,) > 0} 


by finding a frequency curve which will have the correct first four moments of S,+a8,. 
Since we do not know, however, how good such an approximate procedure will be, we 
confine our calculations at first to such cases where the answer is known by other methods 
so that a direct check will be possible. The only arithmetical answers that are known to us 
are where the populations generating the groups are normal and have the same variance, so 
that a further simplification of the moments given above is necessary, by writing 


Ky=90 r2>3, t=1,2,...,8 
and Ko = Ke = 1,2, ...,8 


Making the necessary simplifications and substituting numerical values it is found that the 
appropriate Pearson curves lie in the Type IV area. We therefore for simplicity of calculation 
consider two other systems of frequency curves, the Gram-Charlier Type A and the &, system 
derived by Johnson (1949). While the former may be expected to give good results in most 
cases, the latter system is used as a check on whether such divergencies from exact values as 
are found are due to method or are due to the negative frequencies implied by a Gram-Charlier 
expansion being of non-negligible importance. 








54 The effect of non-normality on the power function of the F-test 


A numerical comparison with certain of the values quoted by Patnaik (1949) Table 7, and 
derived from Tang’s (1938) basic tables is given below in Table 1. The symbol, A, used by 
Patnaik is defined by 


A= >» CF /K- 


In our case the number of groups is 4 and the number of observations }n, = 24, corresponding 
to Patnaik’s case f, = 3, f, = 20. 

These probabilities are generally slightly closer to the exact values than those obtained 
by using the approximation suggested by Patnaik and not quoted here. The figures show 


that for the calculation of power curves the technique employed for estimating probabilities 
is likely to be adequate. 


Table 1. Comparison of probabilities estimated from frequency curves with the 














exact values 
Significance PUL, tay >S 
level A 

" Gram-Charlier Johnson (Sy) Exact 
0-05 0 0-0523 0-0512 0-0500 

+ 0-290 0-305 0-300 

16 0-877 0-873* 0-874 
0-01 0 0-0095 0-0097 0-0100 

+ 0-112 0-114 0-113 

16 0-657 0-654 0-653 


























* Log-normal curve fitted in this case. 


11. The moments given above are so general that to list all the simplifications which might 
be made would require much space. We shall give two illustrations here of the way in which 
they may be used to attack problems concerning the single classification in the analysis of 
variance, and for our first illustration we consider the effect of unequal variences. It is 
assumed that there are s normal populations with unequal means and variances, and that 


the numbers drawn from the populations to form the groups are not necessarily equal. All 
cross-product cumulants vanish and we have 


&(S8,) = 3 u(1-%) +No?; &S,) = x (m%— 1) Ky, 


=1 
«(S?) = 25x32, (1 - a) + a (Sy Ky)? + 4D, Ky, CF, 
(SP) = 25(m— 1) Kb, 
3 8 2 
we (S$) = BEB (1 — Se) — 5 (Src + Fr (Sm) (Sam Ab) 


24 
+ 245m OF — 5 (EmOpka)* 
«(S3) = 8d (n,— 1) Ky, 





7, and 
ed by 


nding 


ained 
show 
ilities 








F. N. Davin ann N. L. JoHNson 55 


we (St) = 4831 (1 FF!) + Fa (rand! + Fy mca) (Sime —F (Bree) 
a (Sn, K3,) (Lj Ky) + 192 Dn, CF <3, 
+ ty mira) | (5p Eea) (Sm Cin) — 28m Cid | 
(S$) = 4855(m4—1) eh 


When the means of the groups are assumed equal all terms containing C, disappear; in our 
brief arithmetical investigation we have assumed this to be so. Four groups each containing 
six observations were taken, and six combinations of values for kK), Koo, Kes, Keq WeTe con- 
sidered. The effect of inequalities in these variances on the nominal significance levels is 
shown in Table 2, the Gram-Charlier system being used for purposes of estimation. 


Table 2. Effect of unequal variances on the nominal significance levels of the F-test 











Nominal level 
Ker Keg Keg Keg 

0-05 0-01 
l 1 1 l 0-052 0-010 
l 1 1 2 0-057 0-013 
l 1 2 2 0-057 0-013 
1 2 2 2 0-055 0-011 
1 1 1 3 0-063 0-021 
1 1 2 3 0-060 0-015 
1 1 3 3 0-063 0-016 














The first line of the table has been reproduced from Table 1 so that the accuracy of the 
method employed may be borne in mind. It will be noted, as might be expected, that the 
situation most likely to lead to wrong judgements of significance will be when one of the 
group variances is very much larger than all the others. 


12. Gayen (1950) has discussed the effect of non-normality on the F-test when the parent 
population generating the samples is of Gram-Charlier type. We have assumed nothing about 
the shape of the parent populations, except that the cumulants exist, and that these 
cumulants may be different for each population generating each group, and thus it would be 
possible with the aid of the moments and product-moments which we have given earlier, to 
discuss approximations to the distribution of the F ratio to order 1/N* in a very general way. 
We outline as a second illustration how this might be done for the z ratio, our choice of 
z rather than F being based on the reason that previous work by David (1949) showed that 
the numerical coefficients of the expansions were smaller for z than for F. Thus 


8,-€(S, $,-6(S &(S &(S 
2z = log {(Sy/f;)/(Salfs)} = log (1+ FS) aa | - — log (1+ Fi) ») 10g as les 7. 


As previously let E[S, —&(S,)¥ [S.—E (Sa) = w(STS3) 
and further write M(S{S3) = w(S{S3)/[E(S,)]}" [F(S2)1” 











56 The effect of non-normality on the power function of the F-test 


Expanding the first two logarithms on the right-hand side, a procedure which will be valid 
only for S and n large, and taking expectations we have 
26 (z) — log {[F(S,)/Al/[E (S2)/fe]} = — 1M (S}) — M(S})) 
+ $[.M(S}) — M(S3)] — 3{M (Sf) — M(S})]+ ...- 
If we decide to work to order 1/N* there will be contributions from terms of the fifth and 
sixth orders for pS) = K(S®) + 15K (S*) « (S%) + 10x2(S3) + 15«3(S2) 
18%) = K(S%) + 10x(S) x(S?) 
and «(S*) «(S*) and «°(S8?) will both be of order 1/N°. 
The second moment may be obtained by squaring the original expression for z, whence 
{2z—log[&(S,)/f,\/[E(S2)/fal}? = M(S}) — 2M(S,S,) + M(S3) — M(S}) + M(S}S;) 
+ M(S, S82) — M(S3) +44.M(S}) —$M(S38,) 
— 4M (S283) —3M(S, S83) +44.M(S})-.... 
The same remarks hold good about the order of the expansions. Similar expressions may be 
found for the third and fourth moments. 


13. Particularly simple expressions result from the procedure of the previous section if 
it is assumed that the generating populations have the same non-normal distribution for 
each group, the same mean, if the numbers in the groups are the same, and if we work to 
order f-* only. Writing 

Vr = ee fi=s-1, fy =8(n—1) 


we shall have 6(S,) = Kofi, (Ss) = Kefe 


and M(S) = 72 M(S,8,) = a, M(S2) =~ 


+5, 
fi 
Y, , 12 4yif,-1.& 

M(S}) = fate Th 


NtR 





M(S?8,) = 2s, Mt 





Nf,’ 
MS, 2) = Ft +E, 
12 4 -1 
M (S83) = s+ ar 4+——h 4y¥i a (fe -h + 


If the expansions are taken to order f-* no other quantities than these will be required. 
Substituting into the formulae of the previous section and for brevity writing 


® 1 
al am a a 
we have E(z) = —} (a, + fora) + LH+GH (a-a+4), 


(2!) = § (26+ 9.77) 9% - F(a, -A,-2-2), 


whence alts) alata) aman lt) AA 


te 








valid 


and 


on if 
1 for 


red. 





F. N. Davin anv N. L. JoHNSON 57 


We note that when the populations generating the samples are normal, we obtain the correct 
first two terms of the expansion of o? in terms of f, and f,. It is clear, for the case considered; 
that the effect of the departure from normality of the parent population on the mean and the 
standard error of z will be slight and it is not surprising that Pearson (1931) was unable to 
draw any really firm conclusion from his sampling experiments. 


14, These two illustrations are sufficient to show the great range of usefulness which the 
general expressions for the moments have. It has not been possible in the course of a single 
paper to do anything but the minimum of arithmetical investigation. It is hoped however 
to continue the arithmetical side in a further publication. 


REFERENCES 


Davin, F. N. (1949). Biometrika, 36, 394. 

Davin, F. N. & Kenpatt, M. G. (1949). Biometrika, 36, 431. 
Gayen, A. K. (1950). Biometrika, 37, 236. 

Geary, R. C. (1947). Biometrika, 34, 209. 

Jounson, N. L. (1949). Biometrika, 36, 149. 

Patnark, P. B. (1949). Biometrika, 36, 202. 

Pearson, E. S. (1931). Biometrika, 23, 119. 

QUENSEL, C. E. (1947). Skand. AktuarTidskr. 30, 44. 

Tana, P. C. (1938). Statist. Res. Mem. 2, 126. 











[ 58 ] 


EFFICIENCY OF THE METHOD OF MOMENTS AND THE 
GRAM-CHARLIER TYPE A DISTRIBUTION 


By L. R. SHENTON, College of Technology, Manchester 


1. INTRODUCTION 


We propose to consider the efficiency of the method of moments applied to fitting the Gram- 
Charlier series of Type A. In practice, since the sampling variances of moments higher than 
the fourth are large, only the first four moments of the observed frequency data are in general 
used, so that we shall confine our attention to the Type A series taken as far as the term 
involving the fourth moment. 

It has been pointed out by Fisher (1921, pp. 355-6) that in the region of the normal point 
the Pearson system of curves closely resembles the system for which the method of moments 
is efficient; for 80% efficiency this region is restricted to 2-62 </,<3-42,8,<0-1. The 
question arises as to how the departures from normality expressed by the Gram-Charlier 
type distribution affect the efficiency of moment-estimation and how this compares* with the 
situation for the Pearson system. 

To evaluate the efficiency of moment estimation for the Gram-Charlier series we shall use 
a slight modification of the method given in a previous paper on maximum likelihood and 
moments (Shenton, 19507). It consists in developing a determinantal expansion, which, since 
its terms are positive, provides a lower bound to the efficiency; in particular cases when there 
is one parameter for estimation it is possible to set lower and upper bounds to the efficiency. 


2. THE EXPANSION FOR THE INFORMATION DETERMINANT 


The expansion for the efficiency H, given in M.L. expression (16) is capable of a simpler form 
in the case when the frequency function has a polynomial factor, as for example is the case 
with Type A. For the information determinant is of the form 

















%A,(x)A,(x)w(x)dx| _. 
= i k ly ’ 
pit I. B(x) (j,# = 1,2,...,m), (1) 
and it may be shown under fairly general conditions that 
. | (0) [a] 
A =(— n] : ns ~ . 
( ) pe od [a] hn. (Pl. | [Bas | ( ) 
where (i) Vi ae 
(“Jas = ee iin - 
n%o n& n%&s 


and [a]/, is its transpose. 
b ‘ 
(ii) m= | p(x) A, (x) w(x)dx (j =1,2,...,n;k = 0,1, ..08)-| 


b (+) 
Bix = Bus = [. P(X) p(x) B(x) w(x)dx (j,k =0,1,...), | 


* I am indebted to Prof. M. G. Kendall for this suggestion. 
+t To be referred to as M.L. 





L. R. SHENTON 59 


and p,(x),j = 0,1,..., is a polynomial in z of precise degree j. The expansion (2), apart from 
a factor depending on the covariance term, is a compact form of (16) in M.L. and can be shown 
to converge in certain cases by an appeal to Parseval’s theorem. In particular it converges 
to A when a = —00, b = 00, A,(x) is a polynomial in x, w(x) = e~#*/,/(27) and B(x) is a poly- 
nomial always positive for real x. Since (2) is equivalent to M.L. (16), the expansion will 
give an increasing sequence as an approximation to A. It is important to remember that the 
polynomial B(x) must be positive for x in (a,b). We must therefore consider the conditions 


ram- under which this is true for the case of Type A. 
than 
neral 3. THE ADMISSIBLE PARAMETER VALUES FOR TYPE A 
term We consider the frequency function 
a a 
point P,dz = (1 + 31 ale) + qi al) g(x) dx, (5) 
where a=(X—m)/o, g(x) = e-#*/,/(27) 
wrlier and H,(x) = e-#:2" is the rth Hermite polynomial with the property 
ae [7 Be Hayg@de =r! (r=), “i 
1 use =0 (r+8). 
— Using - H,(x) = rH,_,(x) along with 2” = e#?: H,(x) the moments of (5) may be seen to be 
shere _ (2r)! _1\% ap 
a = oe itera} (r= 1,2...) 
(2r—1)! a 7) 
* —tf & ve 
Per = 2r-2(r — 2)13! (r 2, 3, .00)5 
form so that for the measures of skewness and kurtosis we have 
_ By = H5MR = 3, Bo = yl R= 4 +8 
V1 = 4s, Yo = % 
2 
(1) Now a term in the information determinant such as Ee log P| involves the quartic 
B(x) = 24+ 4a,H,(x) +a,H,(x) in the denominator so that to apply (2) we must ensure that 
this is positive for — 00 <2 <oo. Writing 
(2) H = —a3—ai, J = —24a3— 2a} — 2402 — 6a2a,, 
I’ = 24a, + 642+ 1202, V = 1'-—27J?, 
(3) the quartic will have four imaginary roots if V > 0 and either H > 0 or 2HI' —3a,J > 0. (See, 
for example, Burnside & Panton, 1881, pp. 116 and 187.) Hence B(x) is positive for real 
« provided (8a, + 2a? + 402)? > (24a? + 2a3 + 2402 + 6aza,)2, (8) 
and 4a3 — a} + 4a,a2 > 3a3.a? + 4a$. 
Table 1 


(+) 





a 0-1 0-2 0-3 0-4 0-5 0-6 0-7 0-8 |, 0-9 1-0 1-5 2-0 2:5 3-0 3°5 





a§ | 0-026 | 0-070 | 0-123 | 0-180 | 0-242 | 0-306 | 0-370 | 0-435 | 0-499 | 0-562 | 0-345 | 1-039 | 1-100 | 0-990 | 0-658 

































































60 Efficiency of the method of moments 


The second condition implies that 0<a,<4. Table 1 shows the restriction imposed by (8) 
on a3 for given a, in (0,4). For later use we compare the scope of Type A under these 
restrictions with the Pearson system of curves, using a (y?, y,) diagram. The region of validity 
of Type A is that enclosed by the curved boundary and the y, axis in Fig. 1. Superimposed 
on this is the corresponding division for the Pearson system (Pearson, 1945, diagram 
XXXV, p. 66). It will be seen that the Pearson curves corresponding to the admissible 
Type A curves are Type IV, Type VII (including the heterotypic cases of these) and Types 
V and VI, putting them in approximate order of importance. Type IV and the symmetrical 


case of this, Type VII, occupy most of the Type A region. We notice that Types I and III do 
not share the region. 


1-0 





Heterotypic 
Zé l 4 
0 1 2 3 4 
YM 
Fig. 1. Showing the region of validity of Type A (enclosed by curve and y, axis) and the 
corresponding Pearson distribution. 








4, EFFICIENCY OF THE FOUR PARAMETER CASE OF TYPE A 
The parameters m, 0, a3, a, in (5) may be estimated by moments using 
6,=2'X/N, 0,= X'(X-9,)/N, (j = 2,3,4). 
The efficiency E,isgivenby E71 =|I| x |[cov(6,,9,)]|, (9) 


- ee ee ae N oP, oP, 
where the informatica matrix is J = [z (pas? 50) | ; 
Since to order N-* 
COV (M5, My) = {M5 — My be + Thea Mej—a Mea — jaca — lly Pas}, 
using (7) it may be verified that 
2-—az+a, 6a, —a,0, —8+ 4a, 
N |[cov (9;,6,)]| = o7°| 6a,-a,4, 6—a23+9a,—aj 12a,—a,a, |=o%Vsay. (10) 
| —8+4a, 12a,—a3a, 56 + 24a, —aj 
For the information matrix J we take in (1)-(4) 





0 . 
A,(x) g(x) = Fe (J = 1, 2, 3, 4); 


06, 
B(x) g(x) = P,; 
w(x) = g(z); 


P,(z) = A,(z), (k=0,1,...); 


a=-0, b=; 














(8) 
ese 
lity 
sed 
ram 
ible 
pes 
ical 
[ do 


(9) 


10) 

















L. R. SHENTON 61 


so that j%% = % H,(x) 72 da, (j = 1,2,3,4; k= 0,1,...). (11) 
iy : 
Bu =Buy= |" Hz) Hylx) Ped, (j,k = 0,1,...) ag) 


The integrals in (12) may be evaluated by using the recurrence relation 





Hy) Hy) = Hy y4(2) + JRE sy a(2) PIV EE—Y ye) 
H=DG- DHE E-2) 





h+k-e(%) + .--, 


the last term on the right being a multiple of sein or H,(x) according as j +k is odd or even. 
Moreover, oP, 2 
S6e — (Hale) + $4 Hala) + Sea] 2, 
oP, 
aP, _ Hy(x)g(x) 
06, 3! oF’ 
OP, _ H,{z) g(x) 
00, 4! of ° 


Hence from (11) and (12) we find 





= (H(2)- 22) wae )+ 4 Ho} 22), 


5a : 
2, a 1% =, 1%; = 0, (j ¥# 1,4, 5); 


’ 275 = o > 2%, 2 2%; — 0, (j+ 2, 4, 5, 6); as) 
1 
3% = 73° 3% = 9, (9+3); 


_— 





4% = a% = 9, (7 +4); 
and a (1435-05 = by 
Bs, 541 =j5+!2 = Cy, 


a a 
Bi,542 = jj +2)!3P =d,;, 
(14) 
; a 
B;,j48 = (J +3)! 3r =f;, 


, a, E 
Bs344 = (G+4)!F =9;, (j =9,1,...); 


Bs j+% = 0, (k = 5, 6, dh 2 
Substituting in (2) and eliminating as many of the border elements as possible we are led to 








20 
ei | I | = lim K,_5(B;, Cs, ds,fs, 9s) (15) 


ic 8 @ K,(bo; Cy, Ip, fo, Jo) . 








62 Efficiency of the method of moments 
= 120+ 600a, — 200a3 + 200a2a, — 25a, 
Cy = ¢5— 10059, — 15a, f, + 75aj¢, + 150a,a,b, 
= 1800a, + 225a,a} — 300a,a,, ‘ (16) 
= 720 + 5400a, — 450a} + 225a3, 


B,=6,, s>6, C,=c,, 8>65, 





where K,( p;; 9m ---) is a symmetric determinant of order s with elements 

Py (j =k, k+1,k+2,...), in the diagonal through (1,1), 

q;, (j =1,1+1,1+2,...), in the diagonal through (2,1), and so on, 
all unspecified elements being zero.* Since we may write V given by (10) as K5(bo, Co, do, fo, Jo) 
we have for the joint efficiency of moment estimates for the four parameters of Type A in 
large sampling E< K,(bo, ¢o Io,.fo» Jo) 

1™ K5(bo, C0 4osfo: 90) Ks-(Bs, Cs, 45,f5+9s)’ 

the ratio forming a decreasing sequence and the equality sign being applicable in the limit 
as 8—> 00. Thus for a particular value of s, (17) gives an upper bound for H,. For computational 
purposes we may use s' = 6 or s = 7 in (17), and the determinants K,(b, Co, do, fo, Jo), J = 5, 6,7 
may be simplified to become the first j — 2 rows and columns of the array (18) below: 





(17) 

















[2+a,—a3 6a, —a,0, —8+4a, — 10a, 30a, : 

—8+4a,  12a,—a,a, 56+24a,—a3 100a,+5a,a, 360a, (18) 
—10a, —30+15a,+5a2 100a,+5a,a, 270+225a,—25a2 12000, 
| 30a, 120a, 360a, 1200a, 720 + 5400a, | 
Table 2. Values of E, in the joint estimation of four parameters for Type A 

a % | 0-05 | 0-1 0-2 | O03 | O4 | 05 | O06 | O07 | O8 | 0-9 1-0 15 | 2-0 
3 
0-0 | 0-99 | 0-97 | 0-92 | 0-88 | 0-84 | 0-81 | 0-78 | 0-75 | 0-73 | 0-71 | 0-69 | 0-63 | 0-59 
01 | — | — | — | 0-60 | 0-70 | 0-73 | 0-72 | 0-71 | 0-69 | 0-68 | 0-66 | 0-60 | 0-57 
es — | — | 0-57 | 0-64 | 0-66 | 0-65 | 0-65 | 0-63 | 0-58 | 0-54 
03 | —|— | — | — | — | — | 0-48 | 0-58 | 0-61 | 0-61 | 0-60 | 0-56 | 0-52 
oij-—i—i—i—i—i]— I — Tt — | eee] Of | oa | oes i Om 
SS oe oe ee ne ne ne es ene ers re 
eb me hee hee bee me hee | oe 2 oe Be i oe 
1 we Ce Ce oe ne ee ee ee ee eee 
See eS a een ee oe oS ese eee 
Fe ens oS ee Oe ee ee ere 
ee ee ee ee ee te ee le eee eee ee 


















































In Table 2 we give values of (17) for s = 6 for admissible values of the parameters a? and a, 
(or yj and y,) so that Z, is certainly less than the figure shown. From Table 2 it is seen that 
in the vicinity of the normal point Z, approaches unity; this is also suggested from (17), since 

* K,(bo, Co, do, fo Jo) is a Symmetric determinant with elements in nine diagonals only; it may there- 


fore be regarded as a generalized continuant and the notation is suggested by Muir’s (1882, pp. 149-60) 
for a continuant determinant. 





~dampeoereraete® & 


(16) 


o Jo) 
Ain 


(17) 
limit 
ional 
~o.7 


(18) 





L. R. SHENTON 63 


from (13) c,d,f,g tend to zero with a, and a,. Type A is therefore similar to the Pearson 
system in this respect. Further, it appears that H,< 80 % for 0-6 <a, < 4 and aj > 0-1 and low 
efficiency is to be expected in the vicinity of the Type A boundary shown in Fig. 1. Although 
the efficiency seems to increase for increasing a, (with a3 = 0-1 say) this may be accounted 
for by the degree of approximation involved. To calculate further terms of (17) becomes 
laborious, but the following additional values assist in fixing the approximate position of 
the bound for the 80% efficiency contour: a, = 0-1, a3 = 0-01, H,< 90-8950 with s = 6, 
E, < 0-8763 with s = 7: a, = 0-2, a2 = 0-05, Z,<0-7029 with s = 6. However in the case 
when a2 = 0 the computation may be carried further, since in this case the determinants in 
(17) now factorize (using Laplacian expansion) and we find 








, Se K.5(bo» do, Jo) Ko(b1, 41,91) a (19) 
ie: 3(59, Io, Jo) Kz_3( Be, de, 9¢) Ko(by, 43,91) Ky_o( Bs, 45; 9s) 
» K,41(b9; 49, Jo) K (61, 41; 91) 
or B.<= _—— SHINO) 0? FO and) 12 P17 _ , (20) 
. 3(59, do; Jo) Kg-2( Bg, dg, 9g) Ko(b1, 21,91) Kg_o( Bs, 45, 95) 
where B,=6,, 8s>6 


B, = 120+ 600a,—25a3, B, = 720+ 5400a,—450a3 
and 6,d,g are given in (14). The two expressions arise according as s is even or odd in (17). 
We use the notation K,(p,, g;, ...) to indicate a symmetric determinant of order s with elements 
Py (j =k, k+2,k+4,...), in the diagonal through (1, 1), 
q;, (j =1,14+2,14+4,...), in the diagonal through (2,1), and so on, 


all unspecified elements being zero. Thus, for example, 


by dy 9% — 
Pid = 5 4%, 
K3(b9,d9,9o) =| Io 5, dz |, K,(B;, 45,95) = ite’ 
5 97 

Jo dy bg 


(19) and (20) give upper bounds for H, and decreasing sequences and taking s = 3, 4,5 we 
have the values in Table 3. 


Table 3. Successive values for E, in the joint estimation of four parameters for Type A 
in the case a, = 0 





a% 0-05 | O01 0-2 | 03 | O04 | 05 | 06 | 0-7 0-8 | 0-9 1-0 15 | 2:0 








Ist approx. | 0-99 | 0-97 | 0-92 | 0-88 | 0-84 | 0-81 | 0-78 | 0-75 | 0-73 | 0-71 | 0-69 | 0-63 | 0-59 
2nd approx. | 0-97 | 0-92 | 0-81 | 0-72 | 0-65 | 0-59 | 0-55 | 0-51 | 0-48 | 0-45 | 0-43 | 0-34 | 0-28 
3rd approx. | 0-97 | 0-92 | 0-81 | 0-72 | 0-65 | 0-59 | 0-54 | 0-50 | 0-46 | 0-43 | 0-41 | 0-31 | 0-25 
4th approx. | 0-97 | 0-92 | 0-81 | 0-71 | 0-63 | 0-57 | 0-52 | 0-47 | 0-43 | 0-40 | 0-37 | 0-26 | 0-19 
5th approx. | 0-97 | 0-91 | 0-80 | 0-71 | 0-63 | 0-57 | 0-52 | 0-47 | 0-43 | 0-40 | 0-37 | 0-26 | 0-19 
6th approx. | 0-97 | 0-91 | 0-80 | 0-71 | 0-63 | 0-57 | 0-52 | 0-47 | 0-43 | 0-40 | 0-37 | 0-25 | 0-17 















































The values in Table 3 are given correct to two figures; the first column of values correct to 
four figures reads: 0-9908, 0-9703, 0-9695, 0-9688, 0-9685 and 0-9680. The capricious behaviour 
of the successive approximations is noteworthy, so that although two successive terms may 
be nearly equal this provides little basis for supposing the remainder negligible. From 
Tables 2 and 3 it can now be seen that E,< 80% for 0:2<a,<4-0, a3>0-1, and since 











64 Efficiency of the method of moments 


E, < 0-7029 for a, = 0-2, a2 = 0-05, it would appear that the 80 % efficiency contour is not 
in the region a3>0-05, 0-2<a,<4-0. This may be compared with the case for Pearson’s 
Type IV distribution, except that the general case is one of some complexity. Taking 


Type IV in the form f(X)AX = yo{l +a%}Wer evtan-te doy (21) 


_X-m 


1 me 
where x Te, Yo = CF(r,v)’ Fir,v) =e f sin’6 e” dé, 


the moments following the well-known recurrence relation 
2ve(s— 1) (s— 1) c®(v? +r?) 
~Hr—s4¢ 1)“ t Get) Mo 
with “4, = 1, 4, = 0, s>2; we have for the other moments up to //: 
fig = cPRY/rr®, 
bs = — 4c°vR?/r?7), 
by, = 3c*R2(r — 2) R? + 8v}/r37, 
Ms = — 8c5vR? {(5r — 12) R? + 24v?}/rtr®), oe 
ig = Be*R?{3(r — 2) (r — 4) R* + 8(13r — 36) R2v + 38404}/r5x, (23) 
My = — 12c?vR? {(35r2 — 238r + 360) R4 + 8(77r — 240) R2v® + 1920r4}/r5r, 
Mg = Tc®R®{15(r — 2) (r — 4) (r — 6) R® + 8(170r? — 12597 + 2160) Rey? 
+ 576(29r — 100) R24 + 46080} /r7r®); 
where rM=rr—1)...(7r—s+1) and R?=72+y%, 
The information matrix, given by Fisher (1921) is 





(22) 











r(r+1)(r+2)(r+4)A —(r+1)(r+2)vA (r+1)(r+2)B —€+1)9B) 
—(r+1)(r+2)vA = (r+3)(2r+8+y*)A —-(r+1)vyB (r+2+v*)B 
I= (r+1)(r+2)B —(r+1)vB Slog F ax log F » (24) 
@ ? 
| —(r+1)vB (r+2+v)B Spor 08F 5,alog F 
where At =c8{(r+4)?+v%}, Bol = e{(r+2)?+y3} 


It is evident from the form of the higher moments and the derivatives in J that HZ, for joint 
estimation would be rather complicated. However, a comparison with Type A may be 
made in the special case v = 0. This situation may arise when we set out to fit a Type IV 
curve to a sample from a population in which v = 0, and where this fact is not known before 
the sample is drawn. The case should be distinguished from the one of simultaneous estima- 
tion of three parameters in a Type VII distribution which we consider in §5. 

Using (23) we find 2880%y3(r — 2) 











N4 | [cov (m,,m,)] | - (r—1)8(r—3)5 (r —5)3 (r— 7)’ (25) 
OM a» Har Marls)| 48c8 
Bm, ¢, me) w-o r(r—1)*(r—2) (r—3)?" (26) 
From (24) we have | Z| PQ 


t| = Gea (r4 ae wn 





is not 
iTson’s 
Taking 


(21) 


(22) 


(23) 


(24) 


joint 
uy be 
ve TV 
efore 
tima- 


(25) 


(26) 


(27) 





L. R. SHENTON 65 
where P = (r+ 2)3 F(4r) — 2(r + 1) (r +4), 
Q = (r +1)? (r + 2)? {F[4(r — 1)] — F(4r)} — 2(r + 1) (r +4), 
F(z) = = qa 8 z!=y'(x+1) in the notation of the trigamma function.* 
Hence using (25)-(27) we have for the efficiency of the method of moments in the joint 


estimation of the four parameters of Type IV 


32(r — 3) (r — 5)8 (r —7) (r + 2)4 (r+ 4)? 


= r(r—1)3(r—2)7>PQ 





when pv = 0. (28) 


Table 4. Comparison of E, for Type A and Type IV in the special case y, = 0 





V2 0-05 | 01 0-2 | O3 | 04 | 05 | O06 | O7 | O8 | 09 1-0 1-5 





Type IV | 0-99 | 0-98 | 0-92 | 0-84 | 0-74 | 0-63 | 0-53 | 0-43 | 0-33 | 0-25 | 0-18 0 
Type A 0-97 | 0-91 | 0-80 | 0-71 | 0-63 | 0-57 | 0-52 | 0-47 | 0-43 | 0-40 | 0-37 | 0-25 















































For comparison with the corresponding case for Type A we must have r = 3+ 6/a, and the 
values of LZ, are shown in Table 4. Since the entry for Type A is an upper bound we may 
conclude that higher efficiency occurs whenever 0<y,<0-6 with moment estimation for 
Type IV than for Type A, and for y,>0-6 there may be cases where the inefficiency for 
Type A is not as high as for Type IV. Type IV has HE, < 80 % for 0°35 < y, < 1-5 while, for 
Type A, E, is certainly below 80 % for 0-2 <y,< 4-0. 


5. THE SYMMETRICAL TyPE A AND COMPARISON WITH PEARSON’S TyPE VII 


We now consider the case of Type A when a, = 0. Writing 





P,dz = (1 +HHa)} g(x)dx, x=- =, (29) 
we have oe [ite )+ SH, (a) 2) g() 
r) 
Ser = (Hole) + GA AHi (x) + Hy(e))) 2} (30) 
OP, _ Hy(x)9(2) 
ca, 4!o ° 





It is evident from the fact that the first of the equations in (30) is an odd function of x that 
for maximum likelihood estimators m is uncorrelated with @ and @,. The efficiency of the 
method of moments in this case then can be considered as follows: 


Case A. Efficiency of the mean m (or m,). 
Case B. (i) Efficiency of & given a,, where & = m,. (ii) Efficiency of a, given o. 


Case C. Efficiency of and @,, where G = m, and a,+3 = “t. 
2 


* The F notation is due to Pairman, T'racts for Computers, No. 1; the psi function notation is used 
by H. T. Davis, Tables of the Higher Mathematical Functions, Vol. 11, Principia Press, Inc. 


Biometrika 38 5 








66 Efficiency of the method of moments 


Here we have used m, for the rth moment of the sample of N about its mean m,. Cases 
A and C are the most important in practice. Case B (i) may have interest for it corresponds 
to the efficiency of the estimation of ¢ when r is known for the Type VII curve (v = 0 in (21)). 
Jeffreys (19396, p. 708) has indicated that with data of observational errors, when the sample 
size N is less than 500, the estimate of the index (r + 2)/2 is unreliable and a value 4 may be 
used in such cases. The parallel case with the symmetrical Type A would be to take a, = 2. 
Case B (ii) is not likely to occur in practice. 


Case A. Efficiency of the mean 


2 
We have oes = F (5 a) and Nvarm = o?, 
m x 


so that in (2) we take A,(x) g(x) = 0P,/0m and following our previous method find 
K,(b1, 41,91) 


EM) < gy Se. a3, (31) 
; K,_ (Bs, Ds, 95) 
D, = 60a, — 5aj, D,=4d,, 83; 


using the notation of (14), (19) and (20). But we also have 


pa r {Hi (x) + (04/4!) He(x)}* 9(x) 
Ej) Jo 1+ (a,/4!) H(z) 


_ , , 2a, a(4—a,)(° — Ay(x)g(zx) 
iis Maite ol” ee ” 








, - 2a, a(4—a,) K, _4 (bs, d5,95)\ 
which leads to Emm) > {1+ 4 “as | ‘ 32 
' 3 8 K ,(b;, 4,93) (82) 





in the same notation as (31). From (31) and (32) upper and lower bounds may be found for 
E,(m). If we wish to make a comparison with Pearson’s Type VII we find from (21), (23) and 
(24) with v = 0, Em) = (r—1)(r+4)/{(r +1) (r+2)}. With s = 5 in (31) and (32) we thus 
find the values in Table 5. Interpolating in the table it is seen that E,(m)> 80% for 
0<y_< 1-5, and Em) < 50% for 3-:0<y,<4-0. For Type VII E,(m) > 80 % for 0<y,< 6-0 
and is certainly higher than is the case for Type A. It may be noted that for Type A, since 
var (median) = 1/(4NO?), where O, is the median ordinate, {var (median)/var (m)} = 327/(8 + a4)? 
which is less than unity for a, > 2; thus for a, = 3, 0-55 < HE, (median) < 0-66. Hence for a, > 2 
the median is a slightly better estimator than 7. 


Table 5. E,(m) for Type A and Type VII 





Ye 0-05 | O02 0-4 0-6 0-8 1-0 15 2-0 3-0 





Type A (i)* | 0-9996 | 0-9944 | 0-9799 | 0-9580 | 0-9299 | 0-8965 | 0-7951 | 0-6795 | 0-4527 
(ii) | 0-9996 | 0-9944 | 0-9801 | 0-9589 | 0-9324 | 0-9018 | 0-8141 | 0-7209 | 0-5493 


Type VII 0-9996 | 0-9950 | 0-9842 | 0-9714 | 0-9583 | 0-9455 | 0-9167 | 0-8929 | 0-8571 






































* (i) and (ii) refer to lower and upper bcunds respectively. 





. Cases 
sponds 
n(21)). 
semple 
may be 
a, = 2. 


(31) 


(32) 


und for 
23) and 
ve thus 
)°% for 
Vo < 6-0 
L, since 
8+a,4)? 
ra,>2 








L. R. SHENTON 67 
Case B (i). Efficiency of & given a, 


o . 4K, (By, D4, 94) 


Nvaré K,(bo,d,9o) 


where B, = b,—4a,d, + 4a3b, —g3 = 24+ 72a, — 25a? + 4a, 


From (30) and (2) we have 





B,=6,, s>6, D,=d,, 8>4; 


and the other symbols are those given in (14) and (20), so that since 4N var@ = (a,+ 2) o?, 


we have se 
Efe | a4) on K (bo, do, Jo) . (33) 
K (bo, do, Jo) Ks_2( Ba, Da, 9a) 





A lower bound may also be found, for from 


. {H,(2) + (a4/3!) Hy(x) + (aq/4!) Hy (x)}? 
—© 1+ (a,/4!) H(z) 





g(x) dx 














ee ~ 4a ee [ {H,(x) + 1}* 
hie ee PSE AT? aca 
os 4 _ (4-4) K,_(By, De, g2)\~ 
there follows E,(G | a4) > Sta, {2 +% r Tao (34) 
where B, = 6,+4b5 =6+a, B,=6,, 8>2, 
D, = dg— 299 = 6a,, D, =d,, . 8>2, 
and the other symbols are those used in (33). 
Table 6. E,(|a,) for Type A and the corresponding case for Type VII 
Vs 0-05 0-2 0-4 0-6 0-8 1-0 L5 2-0 3-0 





Type A (i)* | 0-9993 | 0-9920 | 0-9747 | 0-9500 | 0-9173 | 0-8759 | 0-7369 | 0-5684  0-2747 
(ii) | 0-9993 | 0-9920 | 0-9752 | 0-9525 | 0-9243 | 0-8908 | 0-7880 | 0-6704 0-4514 
































Type VII 0-9992 | 0-9893 | 0-9649 | 0-9341 | 0-9006 | 0-8667 | 0-7857 | 0-7143 | 0-6000 





* (i) and (ii) refer to lower and upper bounds respectively. 


For the corresponding problem for Type VII when r is given, it follows since ~, = c?/(r — 1) 
that N varé = c*r/{2(r—3)}, and from (24) Nvaré = c®(r+4)/{2(r+1)}. Hence for the 
efficiency of ¢ given r we have E,(¢ | r) = (r — 3) (r + 4)/{r(7 + 1)}. The comparison with Type A 
is given in Table 6, taking s = 6 in (33) and (34). It will be seen that E£,(@|a,) > 80% for 
0<y.<1-0 and <60% for 2-0<y,< 4-0 approximately. For 0<y,< 1-5, H;, is higher for 
Type A than for Type VII, but the difference is not marked. It may be noted that in the 
range 2-()<y,<4-0 the efficiency is in the regiori of 70% or below in both cases and is 
unsatisfactory; for Type VII, however, it has been shown by Sichel (1949) that the method 
of frequency-moments has a highly satisfactory efficiency. 

5-2 











68 Efficiency of the method of moments 
Case B {ii). Efficiency of a, given o 








For the maximum likelihood estimator @, we have 














1 a g(x) dx a 35 
N vara, all 1+ (a,/4!) H,(x) i) — | 
i K (bz, 4a, 92) ot 36 
‘ ay a do, Jo) ‘ (36) ; 
ae 1 _(° _{H,(z)- 2}? d } F 
= Nvara, 644(4—4,) ( 1s + (a4/4!) Ha)? | ; 


<p ings (o-2 SPP "a 
3a,(4— 4) K,41(bo, Iq 9o)) ° 
where By, = bg+bo=3+a,, Be=b,, 8>2; 
Dg = d,+99 = 9a,, Di, =d,, 8>2; 
and b,d,g are otherwise given in (14). 
For the estimation of a, by moments we may use 
(a) @,+3 = m,/o4, (b) @ = (m,—6m,07 + 304)}/o4, (c) Gg +3 = m,/m3, 
in which m, = £(X —X)"/N, r = 2,4, and in large sampling we have approximately 
N vara, = 96 + 204a, — aj, 
N vara@, = 24+ 72a,—ai, 
N var @q = 24+ 72a, —25a3 + 4a}. 
It is evident then that @, is a poor estimator and indeed E,(@,|o)->} for small a,. The | 
efficiency for the other two is given in Table 7, using s = 4 in (36) and (37). 


Table 7. E,(@,|o) and Ea4| 0) for Type A | 








% 0-05 0-2 0-4 0-6 0-8 1-0 1-5 2-0 3-0 





E,(a;\o) (i)*| 0-9753 | 0-8358 | 0-6722 | 0-5511 | 0-4602 | 0-3895 | 0-2660 | 0-1843 | 0-0771 
(ii) | 0-9759 | 0-8456 | 0-6980 | 0-5905 | 0-5101 | 0-4475 | 0-3373 | 0-2635 | 0-1660 





EG |e) (i) | 0-9755 | 0-8565 | 0-7213 | 0-6237 | 0-5508 | 0-4934 | 0-3867 | 0-3023 | 0-1448 
(ii) | 0-9761 | 0-8666 Saat Phe 0-6105 | 0-5668 | 0-4904 | 0-4321 | 0-3118 | 



































* (i) and (ii) refer to lower and upper bounds respectively. 


Thus in either case the efficiency is below 80 % for a,>0-3 approximately, and although 
a is a better estimator than @ it cannot be regarded as satisfactory for a departure from 
normality exceeding a, or y, = 0-4. In other words, in large sampling from a Type A popula- 
tion with significant leptokurtosis, for which the parameters of scale and location are known 
(or the scale parameter alone is known) the method of moments should in general only be 
used as a first approximation. 


a woe 


Case C. The efficiency of the joint estimators & and a, 
Without going into details it is found using the previous methods that 


a K,,,3(b, do, Jo) 
EYG,a )< = 8+3\"0? “0 0 : (38) 
. , K4(bo, do, Jo) K (Bg, de, Ye) 


with B,=6,, 8>6, 








L. R. SHENTON 69 : 


the other symbols being given in (14) and (20). For the corresponding problem with Type VII, 


| using (23) and (24), Bye. 7) — OO +4 (rt 2)? (r+ V(r 5)*(r—7) 
(35) He?) =~“ ap (r—2)(r—-3)Q” 


where Q is given in (27), and r = 3+6/a,. Taking s = 3 in (38) we find the values given in 

: Table 8. For Type A then E,(7,a,)<80% for 0-4<a,<4-0, and less than 50% for 

| 1-5<a,< 4-0. On the other hand, the efficiency for Type VII is higher than for Type A for 

0 <a, < 0-5 and perhaps a little outside this range, since the Type A entries are upper bounds. 

(37) | Thus for the range 0<y,<0-5 Type VII is nearer Fisher’s system of maximum moment 
efficiency (the exponential of a quartic) than is Type A. 





(39) 


(36) 


| Table 8. E,(G,a,) for Type A and E,¢,7) for Type VII 





a 0-05 | Ol 0-2 0-3 0-4 0-5 0-6 0-7 0-8 0-9 1-0 1-5 2-0 3-0 





0-613 | 0-585 | 0-558 | 0-441 | 0-337 | 0-163 


643 
Type VII | 0-996 | 0-986 | 0-947 | 0-888 | 0-815 | 0-731 | 0-642 | 0-550 | 0-460 | 0-372 | 0-291] 0-000 | — — 





















































| Type A_ | 0-978 | 0-942 | 0-870 | 0-809 | 0-758 | 0-715 | 0-677 


| It may be worth noting that an upper bound may also be found for the efficiency, in joint 














estimation, of either & or @,. In the case of a, we have 
A 10P 
tie a N vara, = E (550) ‘[4. 
10P 1 aPaP 
a : #(55q) # (Pascaa, 
iby where "alee a (32) “ 
P?0a,00 Pia 
ip ’ 10P _ 4K, 9 (By, Dy, 94) 
71 so that we find Ep =) = lim o-2 ee 40 
v1 so that we fin Pio lim OK (by dos Go) (40) 
48 i and = lim 4K, (Bg, dg, 9s) (41) 
18 sao 2K (bo, do, Jo) 
= in the notation of (14) and (33). Moreover, since @,+3 = m,/m2 so that 
ough N var @, = 24+ 72a,— 25a? + 4a? = B, 
from K, «(By Dg, 94)\ .: K(bp; dos Go) 
we fund E,G,) = lim {2-8 Be Dads him Fan (42) 
pula- a) so | By K (bo, do, Jo) } 8’ +0 Ky_g(Bg, dg, Jo) 
nown "4 
K(By, Dy, 9s) 
lybe | = lim (43) 
y 8—>o K,(B,) K,_, (Bg; dg, 96)’ 
| since neither of the limits in (42) is zero. Further (43) may be written 
B, 0 0 
By, Dy % zB. D 
» as 4 4 94 
(38) ’ ry 5 ‘2, & «© ; 
94 Me 4 ae a * 
’ wi : s st+1 




















70 Efficiency of the method of moments 


and so is a Schwein’s determinantal ratio inverted; since the K,(B,, D,,g,) are essentially 
positive (43) therefore gives a decreasing sequence. As an illustration of this feature the values 
of E,(a,) fora, = lands = 1, 2, 3 and 4 are 1-0000, 0-6560, 0-6165 and 0-6158. When s = 4 the 
values of E,(@,) are given in Table 9. The last row gives the corresponding value for Type VII 
found from (23) and (24) as 
ny = 8+ 2)? (r+ 1)? (r —5) (r—7) 
4 r(r— 1)? (r—3) (r?-—r+18)Q° 


For Type A it is seen that E(@,) < 80 % for 0-4 < y, < 4-0 and less than 50 % for 2:°0< y, < 4:0 
approximately. H;,(7) for Type VII is higher in 0<y, < 0-4. 





Table 9. Efficiency of @, in joint estimation 





% 0-05 0-2 0-4 0-6 0-8 1-0 1-5 2-0 3-0 





Type A 0-9782 | 0-8714 | 0-7665 | 0-6976 | 0-6503 | 0-6158 | 0-5556 | 0-5026 | 0-3625 
Type VII | 0-9964 | 0-9474 | 0-8181 | 0-6519 | 0-4777 | 0-3130 | 0-0000 —_ — 






































6. THE MAXIMUM LIKELIHOOD EQUATIONS 


Considering the symmetrical Type A distribution for which 





Pdz= (1 +THAe)} g(x)dx, x= 


and writing for the likelihood function of a sample (X,, X9, ..., Xy) 


we have, after simplification, 


OlogL _ y ((x) re s| , 











am 3! Q(z) (44) 
AloghL 1. 8—a,H, 

Ben = 5B Hle)— 44 See, (45) 
OlogL 1 1 

alee ar cIE a 


in which Q(a) = 1+ tH (x) and & denotes summation over the sample. For maximum 


likelihood estimates we require solutions of these expressions equated to zero. Using 
M,7,a,, the moment estimates as first approximations, we may find improved estimates 


m,o,%%, from d log L + . aa log L = 




















.  .-" (47) 
OlogL  ~ —,@logL . _ @logL _ 
aS +(o-@)- ag? + (d,—4,) oaea, ~” (48) 
OlogL  ~ —,@loghL ,~ _ GlogL _ 
Ga, * (o— ) 04,00 + (4-4) Fa = 0, (49) 











ially 
ues 
t the 
VIT 


< 4:0 





(44) 


(45) 


(46) 
num 


sing 
ates 


(47) 


(48) 


(49) 





rr eee 





a 





L. R. SHENTON 71 














where 
log L a _ls f 4 444 + F(x) _ (4 — 4%) (1+ H,(x)) 
om? é 3! Q(x) 3! Q2(x) F 
®logL 1 (24—100,—-G,H,(x)) (4—G,) (8—d,— 2%,F(2))) 
oa? = 52 (2-3Hy(e) + 20) nae ost 20(z) Zz \, 
2, ae 
0G,0 ao \Q(e) OXe) |’ 
@logL i 1} 
ca |! a 
and Q(x) = Q(x, M, G), 


and in which cross-product terms involving m such as 0? log L/Omdo are ignored since their 
expectations are zero. In the case when a,< 1-5 approximately we see from Table 5 that 
m is an estimate of m with 80 % or more efficiency in large sampling. Hence when a, < 1-5 
improved estimates of o and a, may be found from @ and @, using (48) and (49) only. The 
case when a, > 1-5is likely to be more involved. For this we could use (47) to improve m (or the 
median estimate of m) and (48)—-(49) to improve @ and a,. The process could be repeated if 
necessary. 

The parallel problem with Pearson’s Type VII has been discussed by Jeffreys (19396; 
1948, pp. 184-7), who uses a different form from (21) with v = 0, involving orthogonal 
parameters 0;, j = 1, 2,3, in the sense that the expectation of such terms as 0 log 1/00, 00, is 
zero. A different approach to the problem has been described by Sichel (1947, 1949), whose 
method of frequency-moments is fairly efficient in the range where the method of moments 
is very inefficient. It would seem that if we are to find efficient estimates of the parameters 
involved, then the use of Type VII may be less laborious than Type A. It is clear that the 
solution of the likelihood equations for an asymmetrical Type A distribution would be no 
light undertaking. 


7. SUMMARY 


The results found in this paper may be conveniently summarized as follows: 

(a) For the Typ A distribution of four parameters (given by (5)) the joint efficiency of 
the method of mor -ats is less than 80 % for 0-2 < y, < 4:0, y?> 0-05; and falls rapidly with 
increasing skewness and leptokurtosis, and also in the vicinity of critical values of the para- 
meters a@3,@, which arise from the condition that the frequency must be positive. For the 
corresponding problem for Pearson’s Type IV we have shown that the efficiency H, is below 
80 % for 0-35 < y,< 1-5, y? = 0 and above 80 % for 0< y,< 0-35, y? = 0. 

(6) For the symmetrical Type A 


P= [1 +$tH(a)}e#/yen), 2 ===, 





(4r)! crtt 
($r—1)! {c? + (X —m)*}ie+® 





and Pearson’s Type VII f(X) = 











72 Efficiency of the method of moments 

















we have: 
Parameters m @ given a, @, given 7 o and a 
estimated t given r é and 7 
E,<80% Type A 1-5<y,<40 1-35 <_< 4-0 0-3<y,<4-0 0-3<y,<4-0 
Type VII Y2> 6-0 Ye> 1-4 — 0-42 <y_,< 1-5 
E,>80% Type A O<y,<1°5 O<v_,<1-35 O0<y2,<0°3 = 
Type VII 0<y,<6-0 O0<y,<1-4 oo O0<y,< 0-42 














(c) In the vicinity of the normal point (0 < y, < 0-5 approx.) the efficiency of the method 
of moments in joint estimation is higher for Type VII than for the symmetrical Type A; 
this remark also applies to Type IV and Type A with four parameters when yj = 0. In both 
cases it appears that near the normal point efficiency is high. (Note. It has been shown by 
Fisher (1921) that the system of curves for which the method of moments gives maximum 
likelihood estimates is of the form y = e4@, where A(x) = bz? +ca*+dz4 and if c/b and d/b 
are small, satisfies a differential equation similar to Pearson’s. From this point of view it 
might be anticipated that near the normal point the method of moments would have high 
efficiency for the Pearson system. But, as has been noted by Jeffreys (1939a), the differential 
equation of Fisher’s curves cannot approximate to a Type IV unless d is positive, in which 
case y diverges ultimately. Furthermore, it appears that the Type A curves are not mathe- 
matical approximations near the normal point to the Fisher system and yet we have indicated 
that these have high efficiency. It may be that Type A and Type IV have approximately the 
same moments as the Fisher system near the normal point, after identification of the first 
four moments, which would explain the high efficiency.) 


8. CoNCLUSION 


Except in the cases of slight departure from normality such as occur when 0 < +, < 0-3 and 
0< yj < 0-05, it appears that the method of moments is inefficient when applied to Type A in 
the form considered here. A departure from normality of this size would require a large 
sample to detect with any degree of certainty. Thus in the case of the symmetrical Type A 
with o known, using (37) we have N vara, >3(4—a,)(2+4,), so that if a, is to exceed 
3 ,/(var@,) then NV > 5700 approximately when a, = 0-2. In general, then, maximum likelihood 
estimates should be found whenever there is reason to believe the observed distribution 
arises from a Type A population. 
It is possible that these remarks also apply to Edgeworth’s form of Type A, 


p= {148240 + Ha) +35 Hi(a)} ate) 


It would not, however, be an easy matter to find the admissible parameter values for P, to be 
positive in this case. But the method described here would lead, in the general case, to an 
upper bound for the efficiency of estimation by moments. This would not provide grounds 
for deciding whether Edgeworth’s form was better from the point of view of efficiency, for 
we should be comparing upper bounds. 

The exact bearing of our results on the practical problem of estimating parameters from 
a sample from a leptokurtic population is not easy to assess. For the measure of efficiency 





ia 


a ee ee . - | 





0 be 
) an 
nds 
for 


rom 
ney 





L. R. SHENTON 73 


calculated is based on a large sample limit and may be lower for a finite sample. Again for 
a Pearson distribution in the so-called heterotypic region the efficiency of estimation by 
moments may be zero. Thus in sampling from a Type VII population y = yo(c? + 2*)-!¢*, 
we have for large samples var m, = 2r/{N(r— 3) (r — 1)?} which is infinite for r = 3, so that the 
efficiency of an estimate depending on the second moment would be zero. But in finite 
rampling in this case the variance of the second moment would not be infinite so that the 
efficiency would not be as low as the large sample figure suggests. A similar remark applies 
to estimation by moments in the case of Type A when the parameters lie in the vicinity of 
the critical region (a border strip on the curvilinear boundary of Fig. 1). For in this region 
we have shown that the efficiency is low or zero, but in practice it may be that the method of 
moments is not so useless. 

The comparison between the Pearson Type VII and symmetrical Type A distributions 
having the same first four moments suggests that the method of moments is more efficient 
with the former, at least in the vicinity of the normal point. Further, in a situation in which 
we consider that an observed distribution can be described either by a Pearson or Type A 
curve, the shape of the distribution may be important. Now the Pearson system is unimodal 
but this is not the case in general with Type A. The form of Type A considered here may 
have three modes, and from a practical point of view statistical data showing such a charac- 
teristic might be regarded as heterogeneous. Thus if we are thinking in terms of unimodal 
distributions and do not allow negative frequencies the admissible parameter values for 
Type A will not be as extensive as is implied in Fig. 1. 

The method we have described in this paper could be applied to other forms of Type A, 
such as those based on the logarithmically transformed normal curve or Pearson’s Type III, 
provided the questions of convergence involved in (2) were settled. Discrete distributions 
such as Charlier’s Type B could also be treated. 


REFERENCES 


Burnsipg, W. 8S. & Panton, A. W. (1881). Theory of Equations. Longmans. 

FisHer, R. A. (1921). On the mathematical foundations of theoretical statistics. Philos. Trans. A, 
222, 309. 

JEFFREYS, H. (1939a). The law of error and the combination of observations. Philos. Trans. A, 237, 
231. 

JEFFREYS, H. (19396). The law of error in the Greenwich variation of latitude observations. Mon. Not. 
R. Astr. Soc. 99, 703. . 

JEFFREYS, H. (1948). Theory of Probability. Oxford: Clarendon Press. 

Morr, T. (1882). A Treatise on the Theory of Determinants. Macmillan and Co. 

Pearson, K. (1945). Tables for Statisticians and Biometricians, Part 1. Biometrika Office. 

SHENTON, L. R. (1950). Maximum likelihood and the efficiency of the method of moments. Biometrika, 
37, 111. 

S1cHEL, H. 8. (1947). Fitting growth and frequency curves by. the method of frequency-moments. J.R. 
Statist. Soc. 110, 337. ‘ 

SicwEt, H. S. (1949). The method of frequency-moments and its application to Type VII populations. 
Biometrika, 36, 404. 








[ 74 ] 


ON DISTRIBUTIONS FOR WHICH THE HARTLEY-KHAMIS 
SOLUTION OF THE MOMENT-PROBLEM IS EXACT 


By H. P. MULHOLLAND 


1. INTRODUCTION 


Suppose that the moments of a certain distribution are known exactly up to order n 
(inclusive). Hartley & Khamis (1947) have described a procedure yielding estimated values 
of the cumulative distribution function (c.d.f.) at equally spaced points belonging to a chosen 
set or ‘grid’; the resulting table for the c.d.f. has a rather wide tabular interval, h say, and 
subtabulation can be achieved by shifting the grid (Hartley & Khamis, 1947, §7, especially 
p. 348). The present paper is intended to give a clue to the type of case for which this pro- 
cedure is suitable by determining the class of distributions that it reproduces exactly. 
(Owing to the basis of the procedure (Hartley & Khamis, 1947, §2) this happens when 
Sheppard’s corrections for grouping are exact.) It will appear in §§3 and 4 (I), that the 
distributions in question are those generated by the procedure itself in the cases where its 
estimate for the c.d.f. is an increasing function, and that in any case its result is representable 
by a finite expansion fitted by moments and resembling the Gram-Charlier Type A expansion 
(carried as far as the given moments permit). Moreover, the leading term in the former 
expansion represents the n-fold convolution of a rectangular distribution with itself. Since 
this rapidly approaches normality as n increases, the resemblance to the Type A expansion 
is numerical as well as formal; it will be discussed further and illustrated numerically in §8. 
Altogether, despite its apparent dissimilarity, the Hartley-Khamis procedure is akin to the 
various classical expansion methods, especially the Type A expansion. We may thus expect 
the procedure to work best when the leading term in its expansion yields already a good 
first approximation. It is to be noted that the distribution given by this term, besides 
being nearly normal, is symmetrical, slightly platykurtic, and of finite range. 

In §4(II), certain classes of distributions are shown to have exact Hartley-Khamis 
estimates for one position of the grid. In §7 (a) alternative methods of subtabulation are 
compared, especially as regards continuity; in §7(b) I give rules for choosing the interval h; 
and in $7 (c)-(e), it is noted that some of the formulae leading to (I) can serve as the starting- 
points of investigations into the magnitudes of the errors in the Hartley-Khamis estimates. 


2. NOTATION AND PRELIMINARIES 


After a suitable change of scale and origin (cf. Hartley & Khamis, 1947, §3(15), p. 343) we 
shall have h replaced by 1 and the centres of the grid-intervals or ‘groups’ at the points 


r=f,=i-}n (t= ..,—1,0,1,...,n,n+1,...). (2-1) 
Supposing that the c.d.f. is now P(x), we introduce its central difference 
p(x) = SP(x) = P(x+}$)—P(x—}). (2-2) 


Then for the ‘grouped frequencies’ p(£;) the Hartley-Khamis procedure provides estimates 
p*(&,) determined by the conditions 


Serr) =-|" epede=% (= 0,1,...00) (2:3) 


p*(é,)=0 (¢=...,-2, —1, ort =n+1,n+4+2,...). (2-4) 





om st =f 


Wa 


der n 
ralues 
hosen 
7, and 
cially 
3 pro- 
actly. 
when 
t the 
re its 
itable 
nsion 
ormer 
Since 
nsion 
in $8. 
to the 
xpect 
good 
sides 


amis 
n are 
val h; 
‘ting- 
lates, 





H. P. MuLHOLLAND 75 


In fact, the moments 7; of p(x) about 0, when combined with Sheppard’s corrections (for 
groups of unit width), yield the moments 4; about 0 belonging to the c.d.f. P(x), so that 
equations (2-3) are equivalent to Hartley & Khamis’s equations (13) (1947, p. 342; see also 
Kendall, (3-40), p. 70, and (3-43), (3-44), p. 71). The simultaneous equations (2-3) may be 
solved numerically for the numbers p*(£;) (i = 0, 1, ...,), as is done by Hartley and Khamis. 
Alternatively, observing that, subject to (2-4), the equations (2-3) hold if and only if 


3 mG) BMG) = |" male) Bl0) dz (2-5) 


for an arbitrary polynomial 7,,(x) of degree not exceeding n, we may substitute for 7,(x) 
a fundamental Lagrange interpolation polynomial 1,(x) given by 


I(x) = Pn4r(%) (w@—E) YP yilE) (€ =—4n, —4n+1,..., dn), (2-6) 
Pnai(t) = alt) = (2 + An) = (2 + $n) (x + 4n— 1)... (a— 4m). (27) 

Since /,(€) = 1 and /,(£;) = 0 (€;+£), we get from (2-5) 
Bre) =|" Ka) Ple)de (= & = Ini = 0,1,....m); (2-8) 


and, conversely, (2-8) and (2-4) imply (2-5), since any 77,(x) is a linear combination of the 
polynomials (2-6). Further, the equality in (2-8) will hold, by (2-4), for the other values of 
E also if we write 


L(x)=O0 (€=§ =i—gn;i=...,-—2,-1,n+1,n+2,...). (29) 
Similarly, if we take the grid symmetrical about 7 instead of about 0 we shall get 
pre+n) ={ L(x—9) p(z)da (€ =i-4n;i=..., —1,0,1,2,...). (2-10) 


If 7 is allowed to vary continuously in the interval — } < 9 < } the function p*(x) is now defined 
for all values of x by (2-10),} and we may obtain a corresponding estimate P*(x) for P(x) by 


summation: 0 
P*(z) ae x p*(x—$+)). (2-11) 


This series is only nominally infinite, in virtue of (2-9). Clearly )*(x) is obtained from p(x) 
by a linear integral transformation 


rte) =|" K,(e,y) Bway, (2-12) 
whose kernel K,,(z, y) is given, according to (2:10), by 
K,(x,y) =L(E+y—2) (-—4$<2-£<},£ =i-4n;i=..., —1,0,1,2,...). (2-13) 
Evidently, for fixed x, K,,(x, y) is a polynomial of degree n in y (or is zero). Thus we shall have, 
with £ and ~ as in (2-13), n 
K,(z, y) bi LIPE—2)y'IrI, (2-14) 
p*(x) = Das (E —2) f,/r'. (2-15) 


When x = £ equation (2-15) corresponds to Hartley & Khamis’s equation (14) (1947, p. 343), 
and, indeed, yields explicit formulae for the coefficients in their equation. 


¢ For other ways of obtaining estimates for p(x) at values of x intermediate between those in (2-1) 
see §7 (a). 











76 Hartley-Khamis solution of the moment-problem 


3. PRESERVATION OF MOMENTS AND EXPANSION PROPERTIES 


Let g(x) be any function of bounded variation on (—00, 00) such that the absolute moments 
of | g(x) | exist up to the nth order (inclusive), and let us transform it as in (2-12) so as to get 


gr(a) =|" Kyle ngway. (3:1) 


We note first that this transformation preserves moments of orders not exceeding n. In fact, from 
the construction of the transformation (cf. especially (2-3) and (2-10) above) we have 


Ag] =|". a'g(x) dx = X Eetoree+a) (r = 0,1,...,), (3-2) 


i= 


and on averaging both sides over the interval — 4 <<} we get 


wo f(t ee) 
uilg) = | (E+ 0 9*E, +0) dy = | a'g* (2) dx = uly"), 
+ —o) —} —2o 


for r = 0,1,...,”, as required. (As in (2-11) the series are only nominally infinite.) 


Secondly, since the transform of a function according to (3-1) depends only on its moments 
(of orders not exceeding 7), it is clear from what has just been proved that the transformation 
(3-1) 2s ‘consistent’, i.e. that g**(x) = g*(x). Let us note that this property is equivalent to the 
following relation for the ‘iterated’ kernel: 


K®zx,y) = os K,(x,u) K,;(u,y)du = K,(x,y). (3:3) 


(The range of integration is only nominally infinite on account of (2-9) and (2-13).) 
Thirdly, the kernel K,,¥x, y) possesses expansions in the form 


K,(2,y) = E%(0) Bey), (3-4) 


where the functions a,(x) (r = 0,1,...,) are linearly independent and £,(y) is a polynomial of 
degree exactly r: the kernel K,,(x, y) is thus ‘degenerate’. In fact, the expansion (2-14) is clearly 
of the type (3-4), and other such expansions are obtainable (e.g. that in (5-1) below). 
Substituting the expansion (3-4) for K,, in (3:3) we find 


Me 


E Saori |” alu) Aju)du = ¥ a,(2) f(y) 
Since the (n + 1)? functions a,(x) £,(y) of x and y are clearly linearly independent we must have 
¥ _ {0 (r #8) iw : 
[72x Bwyd = ( rae a (r,s = 0,1,...,%), (3-5) 


i.e. the two sets of functions {a,(u)}, {8,(u)} form a normalized biorthogonal system. 
Using (3-4) we get an expansion for g*(x) as follows: 


gra) =|" Sale) Ay) aty)dy = 3 9,04(0), (36) 
=|" Alv)awday, (3-7 
Further, from (3-5) and (3-6) it follows that g(x) satisfies the integral equation g(x) = g*(x), t.e. 


ate) =|" Ka(e,y)ow)dy, (3-8) 








fo 


WwW 


ents 
) get 


(3*1) 


from 


(3-2) 


ents 
tion 
the 


3-3) 


3-4) 


ul of 
arly 
w). 


ave 


3-5) 








H. P. MuLHOLLAND 77 


n 
if and only if it has the form g(x) = ¥ g,a,(x) (3-9) 
r=0 
with constant coefficients g,. 
4. MAIN RESULTS 
From among the possible expansions of K,,(x, y) in the f6rm (3-4) we now select one in which 
the leading term is continuous together with as many of its derivatives as possible. The results 


that emerge are given in (I) and (II) below; in them the functions P(x) (hitherto confined 


to c.d.f.’s) are subject only to the restrictions ‘ 
3 


{° |x|"|dP(x)|<oo (r=0,1,...,2), (4-1) 


which when P(z) is a c.d.f. assert the finiteness of its absolute moments of orders 0 to n. 
(I) With the notations of §2 and subject to (4-1) we have 
P*(x) = P(x) (4:2) 
for all values of x if and only if P(a) is of the form 


Pla) = ¥ a, VQ(x) (4:3) 


with consiant coefficients a,, where V,(a) is the c.d.f. of the sum of n independent variates each of 
which has the rectangular distribution on (— 4,4). Further, in all cases we have 


Pee) = Fa, Ve) = | Vea) |” ware, (4-4) 

r=0 r=0 —-o 

t+} n—? o{n} 
w=] wordy, wi =(-¥ (ZF) EE = Om), (48) 


In particular, V(x) is the only c.d.f. with a continuous (n — 1)th derivative for which (4-2) holds 
identically. 
(II) The equality (4-2) holds when x +4 belongs to the discrete set of points (2-1) if 


Pla)= ¥ 62) (4-6) 


where the b,’s are arbitrary constants and W,,,(x) is the c.d.f. of the sum of two independent 
variates, one with c.d.f. V,,(x) and the other arbitrarily distributed on the open interval ( — 4, 4). 
In particular, (4:2) holds whenever x + 4 is one of the points (2-1) if 


n 
P(x)= 2 or Vn s(2), (4-7) 
r= 
where the c,’s are arbitrary constants ; and, further, in all cases, if we write 
n n ao 
Pa(a)= Se, V2, A(2) = 3 VP.(0) |” wy) dP, (4-8) 
r= r= -0o 


we shall have P*(x +4) = P,(x+ 4) at all the points (2-1). 

We observe, first, that V,(2) defines} a symmetrical distribution with finite range 
(—4n, 4n), mean 0, variance ;4n, and /,—3 = —6/(5n); it approximates more and more 
closely to a normal c.d.f. as n increases; and its first n—1 derivatives are continuous and 
vanish at x = + 4n, while V%"(zx) is a step-function of oscillating sign. Thus, in particular, the 


t For the properties of V,(x) cf. Uspensky (1937, pp. 277, 278 and 305), Kendall (1943, pp. 242 
and 245), and Haldane (1945). An explicit formula for V,(x) is quoted below, § 5 (5-9). 











78 Hartley-Khamis solution of the moment-problem 


expansion in (4-3), or (4-4), can only represent a true c.d.f. when a,, = 0. Secondly, in (4-6), or 
(4-7), the range is (— 4n—4, $n + }), and the continuity properties of these expansions depend 
on those of the arbitrary component in W,,,,. The sum in (4-7), or (4-8), is always continuous. 
Thirdly, it may be noted that the polynomials w,(y) and @,(y) introduced in (4-5) are 
expressible in terms of generalized Bernoulli polynomials as follows: 
w,(y) = (-Y Be M(jn+h+y)/r!, W,(y) = (—/BM(dn+y)/r!. (4-9) 
[Added 12 June 1950. If we indicate dependence on n by writing P*(z,n),a,(n), etc., 
for P*(x), a,, etc., we see from (4-9), (4-4) and (4-8) that 
w,(y,n) = 0,(y,n+1), P,y(x,n) = P*(x,n+1)—a,,,(n +1) VeH(2). (4-10) 
Thus, when /;, ..., 4, are known P,(xz,n) may be calculated as follows: determine a fictitious 
value %,,, 28 a substitute for the possibly unknown moment y;,,,, in such a way as to reduce 
to zero the coefficient a,,,,(7 +1), which by (4:9) equals the expectation of 
(—)rt BEEP (n+ 3+ y)/(m+ 1)!; 
then calculate P*(x, n+ 1) as if the first n+ 1 moments were 4}, ..., 47, W%41-] 


5. PRoor or (I) 
To prove (I) it will suffice to establish the expansion 


K,(2,y) = Svy(e)w-y) (5-1) 
with w,(y) given by (4-5) and »v,(x) by 
v(x) = VELV(xz) (r =0,1,..., 2). (5-2) 


In fact, substitution of (5-1) in (2-12) gives, as in (3-6), 


B(x) = ¥a,0,(2) = Sa, Vet P(), (5:3) 
a, =|” ww) dy =|" wy)dy[” aPC 


=|". dP(t) [ow w,(y) dy = [°z W,(t) dP(t) (5-4) 
Now, from (2-11), 6P*(x) = p*(x) and, from the definition of V,,(x) and V,,,,(x), 
VEtVMa2) = DVS. (2) = Df Vi(a—t)dt = D'8V, (x) = dV (x). 
Hence the relation (5-3), with (5-4), is equivalent to 
6P*(z) = 83a, V¥(2), (5-5) 


and thus to (4-4) since both sides of the latter vanish for x < — 4n— }. Similarly, the condition 
(4-3), for (4-2) to hold identically, is equivalent to a special case of (3-9) (with p(x) for g(x), 
v,(x) for «,(x), and w,(y) for £,(y)). Lastly, the sum in (4-3), or (4-4), can only have a continuous 
derivative of order n — 1 (and therefore also of all lower orders) if all the coefficients vanish 
except a); for the kth derivative of a,V9(x) is continuous for k<n-—r, is discontinuous 
unless a, = 0 for k = n—r, and vanishes except at isolated points for k>n—r. 


+ For relevant formulae see Milne-Thomson (1933, §§6-1-6-4, pp. 126-30, Ex. 1 (ii), p. 150, and 
7:03, especially p. 160). 
pecially p 











SO tenn TO SI DM 


f 


Th 


), or 
end 
Dus. 
are 


4-9) 


tc., 
-10) 


ious 
luce 


5-1) 


5-2) 


5°3) 


(5-4) 


(5°5) 


ition 
g(x), 
uous 
nish 
uous 


, and 














H. P. MuLHOLLAND 79 


To obtain the expansion (5-1) we shall first resolve K,,(2,y) into a sum of a variable 
number of terms, each of which enters the sum as (x, y) crosses one of the lines of discontinuity 
x—£+4 = 0 from left to right, and shall then expand each such term in descending powers 
of x—£+4; the effect will be to analyse K,(x,y) into components of which the first has 
a continuous (n — 1)th derivative (with respect to x) and the last is itself discontinuous. 

For the first step we require the identity 


E+in 
L(§+t) = = (-¥("F ' Ly,(t—43n+t) (€+4n=0,1,...,m). (5-6) 


Since /, and J_,, are polynomials of degree n in ¢ it will suffice to establish (5-6) for the n + 1 
values t = —4n—£, —4n—£+1,...,4n—£. But for these (i) when 1 <t<4n— (in the cases 
£<4n-—1) all the terms in (5-6) vanish, (ii) when ¢ = 0 the only non-vanishing terms are 
I,(€) = 1 on the left and /_,,(—4n) = 1 on the right, (iii) when — 4n—£ <t< — 1 (in the cases 
£> —4n+ 1) the left-hand side vanishes and the right-hand side is unaltered by the inclusion 
of the further terms with i = + 4n+1,...,2+ 1, since these all vanish, and it then becomes 
6"+1]_,,(t+4), which vanishes because /_,,, is of degree only n. 
Secondly, using (2-6) and (2-7), we find 


lyn(E+y—x) = (—)"$,(y- r+E— 3)/n! 
=(- ry ~2+§- 4) 6P(y) 


j! n! 
_ z (t+ $ we £)"-* w,(y) 
r=0 (n Cs r)! 
by application of Taylor’s theorem to the polynomial ¢,, and use of (4-5). On writing i — 4n 
for £ in (5-7) and substituting in (5-6) we get from (2-13), for 
a and £+4n=0,]1,...,n, 


K,(2,y) = ‘y(- "r) L_yp(i—4n+y—2) 
is dud -¥("F') (ux+$+4n-i)"” 


(n—r)! 








(5-7) 





- Zw,y) Vit? (2), (5-8) 


as required for (5-1), in virtue of the known formulat 





Vy(t— $m) = x(-1(") cr (k<t<k+1,k =0,1,...,m—1). (59) 


The cases where | £ | > 4n + 1 are trivial, since in them both sides of (5-8) vanish. 

6. Proor or (II) 
Suppose Q(z) to be given by Q(x) = 3, V(x). (61) 
Then, by (I), writing 9(x) for dQ(x), we meas 

= | Kes) dy. (6-2 


f Cf. Uspensky (1937, pp. 277 and 278), and also the other references given above in §4. 








80 Hartley-Khamis solution of the moment-problem 


Now let W(u) be the c.d.f. of an arbitrary distribution on the open interval (— }, }) and let 
P(x) be derived from Q(x) and W(x) by convolution, so that 


+ 
P(a) -| Q(a—u)dW(u). (6-3) 
“7 
Evidently, (6-1) and (6-3) are together equivalent to (4-6). Now, for 
E=i-jn (t¢=...,—1,0,1,2,...) 


we have from (6-2) and (6:3), 
+ + 
pe) = PG) =|" sag—w aww) ={" ae-wawn) 


+ foe] 

i. { Ako | __ KlE-uy aay 
+ @ 

= | aww KAA, z) q(z a u) dz 
—+t —2@ 

-|" Ky(G,2)dz |" He—w aw 


=|" K,,(&,z) p(z) dz = p*(é), (6-4) 
since, by (2-13), 


K,(€-u,z—u) = I{é+2z—u—(E-u)} = k(E+2z-£) = K, (62) (-}<u<}). 
(It is to be noted that the range of integration in (6-4) is only nominally infinite, and that 
K,,(&,z)9(z—u) is bounded and, except on isolated lines, is continuous.) From (6-4) we have 
dP(£) = dP*(£), and hence the required relation P(é + }) = P*(£+4) by summation, as after 
(5-5). 

For the second part of the theorem we take W(w) to be the c.d.f. of the rectangular dis- 
tribution on (— 4, $). The assertion involving (4-7) then follows at once from the first part. 
Further, if we now allow P(x) to be any function satisfying (4-1), P,(x) as defined in (4-8) 
will be of the form (4-7), and hence we shall have P$(é +4) = P,(£+4) at each of the set 
(2-1). To complete the proof it thus suffices to show that P(x) and P,,(x) have the same moments 
(of orders not exceeding n), so that P*(x)= P(x). Now for r = 0,1,...,2 we have 


[Pmtraraen = Se. [oo VEtPerdy =e, =f" wyaPy), (6) 
from (4-8), by the properties of type (3-5) possessed by the functions v,(u) = V&tP(u) and 


w,(u) (cf. (3-4) and (5-1)). Since w,(y) is a polynomial of degree exactly r the required equality 
of moments follows from (6-5). 


7. DIscUSSION OF APPLICATIONS TO GENERAL DISTRIBUTIONS 


It remains for us to consider the bearing of the results above upon the use of the Hartley- 
Khamis procedure for distributions in general, and we shall discuss four points that are 
relevant to practical applications or to further research. The first three points will be 
considered in the present section and the fourth in §8. 

(a) Alternative methods of obtaining intermediate values. Of various available methods 
yielding estimates for P(x) at values of x intermediate between the numbers 


x=&+4=i-}n+} (¢ =...,—1,0,1,2,...) 





—a™ aa ©®eosc aw 


a  e « ~  e 


a —— =. Oe 


id. let 


(6-3) 


(6-4) 


tley- 
are 


ll be 


hods 





H. P. MuLHOLLAND 81 


we have so far considered only (i) the continuously movable grid, which gives P*(x), as in (2-10) 
and (2-11). Another methodf is (ii) the fixed grid supplemented by ordinary interpolation based 
on some of the values P*(£;+4)(t = ...,—2, —1,0,1, 2,...): e.g. (x) Lagrange interpolation 
using (i = —1,0,1,...,”); or (f) central-difference interpolation using an even number of 
values; or (y) the same but using an odd number. A compromise between (i) and (ii) is method 
(iii) use of a few positions of the grid supplemented by ordinary interpolation.{ Hartley & Khamis 
(1947, §7) suggest the use of two, three, or even four equally spaced positions; they do not 
appear to contemplate practical use of a continuously moving grid. Lastly, we may mention 
(iv) use of P,(x), defined in (4-8), as an estimate for P(x). We now remark, first, that methods 
(i) and (iv) preserve moments of orders not exceeding n (cf. §§3 and 6), while methods (ii) and 
(iii) in general do not; secondly, while most of these methods give continuous estimates for 
P(x), methods (i), (ii) (y), and (iii) (y) give discontinuous estimates in general. The dis- 
continuity of P*(x) (when a, +0) results from the restriction — 4 << 4 imposed upon the 
‘phase’ 7 in (2-10) in order to make P*(x) one-valued. We note from (4-4) that the dis- 
continuous term in P*(z) is a, V(x), and from (5-9) that the jumps of V(x) are alternately 
positive and negative and numerically equal to the binomial coefficients of order n. Thus the 
jumps of P*(2) can easily be calculated. As an illustration, in the second test-example worked 
out by Hartley & Khamis (1947, §7), the jumps would be proportional to 1, —6, 15, — 20, 
15, —6, 1, and the magnitude of the greatest jump would be about 0-00083, which is about 
twice as large as the greatest actual discrepancy between such of the estimates as they 
calculated and the true values. § 

[Added 12 June 1950. This discontinuous final term in P*(z) is also of interest as giving 
some indication of the difference made by bringing in one further moment (though without 
changing the scale of the grid). In fact, from (4-10), 


Anss(n + 1) VEHP(x) = P*(x,n+1)—P,(x,n), 
while by §4 (II), we have P,(x,n) = P*(x,n) whenever z has one of the values 
i—4n-}4 (t=...,—1,0,1,...,.n,24+1,...). 
Hence at these points, which include all the discontinuities of P*(x, +1), we have 
P¥*(x + 0,n-+1)—P*(x,n) = ayy (+1) VEEPMe + 0). (7-1)] 
(b) Choice of origin and scale for the grid. The expansion (4-4) for P*(x), or the expansion 
(4-8) for P,(x), can be somewhat simplified by a suitable linear transformation of the original 
variate, designed to make the transformed c.d.f. P(x) agree in mean and variance with the 
leading term; the two following terms then vanish. (Such a transformation is, of course, very 
usual in connexion with the Type A series: see §8.) This suggests the following rule: in the 
absence of other considerations (e.g. knowledge of the range), when the Hartley-Khamis procedure 
(with continuously movable grid and h = 1) is to be used, the mean should be reduced to 0 and the 


variance to n/12 (at least approximately); and when the alternative estimate is to be used the 
reductions should be to 0 and (n+ 1)/12 respectively. (In terms of the original variate we 


+ The use of a fixed grid and ‘standard interpolation’ was suggested by Hartley & Khamis (1947, §3). 

t Note that using n+1 positions of the grid and Lagrange interpolation within each of the intervals 
£;<2<£,,, we should get the same results as from method (i), since P*(x) is a polynomial of degree 
n in each of these intervals. ; 

§ In this example they used only two positions of the grid, for 7 = 0 and 4 = 4; the use of 7 = —4 
instead of 7 = $ would thus have altered one of their estimates by as much as 0-00083. Actually, 7 = —} 
was the less appropriate owing to considerations of range. 


Biometrika 38 6 











82 Hartley-Khamis solution of the moment-problem 


should have h/o = ,/(12/n) in the former case and h/o = ./{12/(n+ 1)} in the latter.) Since 
V,,,,() is more nearly normal than V,,(x) the latter rule is likely to give the better results for 
nearly normal distributions. As an example, for the normal distribution itself, taking n = 6, 
h = 1, and using only the central position of the grid, the difference between p*(x) and p(x) 
was less than 0-0008 at each of the points (2-1)+ when the variance was 74, but was about 
0-018 ata = 0 when o* was 1, about 0-0015 when o? was 5 = 4, and about 0-024 when o? was }. 


(c) Possible deductions concerning magnitude of error: first method. In order to obtain any 
useful definite bounds for the discrepancy between P*(x) and P(x) we need, of course, more 
information about the required distribution than is provided by its first n moments. If such 
information relates directly to P(x), or to P’(x), then the formulae (2-12) and (2-13) furnish 
the appropriate starting-point for deductions. For instance, suppose that the range of the 
required distribution is containzd in [ — 4n—}, 4n + 4], that P™(x) is continuous in this interval, 
and that P'(x), ..., P™-(x) all vanish at its end-points, where m is an even integer greater than 2. 
Then, using the Euler-Maclaurin sum-formula (cf. Milne-Thomson, 1933, §7-5, (4) and (6), 
pp. 189, 190), we can obtain{ forthe central differences p*(x) and p(x) of P*(x) and P(x) 
the relation § in 

Br e)— pie) = — Fm FY" EK 9) - FB) (7-2) 





for some y such that | y | < 4n+ 1, where B,, is a Bernoulli number, K,,(z, y) is given by (2-13), 
(2-6) and (2-9), and C is an arbitrary constant. From (7-2) we may deduce a formula given 
(less explicitly) by Hartley & Khamis|| for the case where z is one of the numbers 
—4n, —4”+1,...,4n; in fact, we have only to put C = 1 and state the right-hand side of 
(7-2) as a sum of terms arising from those in K,,(x, y)—C, considered as a polynomial in y. 
However, while (7-2) is the formula to which we are naturally led by considering the connexion 
between the Hartley-Khamis procedure and Sheppard’s corrections, it does not appear to 
give results of value in cases of practical interest; moreover, the mth derivative in (7-2) is an 
awkward expression for which to obtain bounds. 

There are various possible ways of avoiding both the disadvantages just mentioned ; I shall 
content myself here with some indications of one line of attack that is capable (in favourable 
cases) of establishing bounds for p*(x) — p(x) that are not too large to be of interest. The idea 
is to transfer the factor K,,(x,y), which is oscillatory [cf. (2-6) and (2°13)], from the factor 
that is repeatedly differentiated (in the integrations by parts that lead to the Euler-Maclaurin 
formula) to the factor that is repeatedly integrated. Suppose now that P’(x), ..., P(x) are 


+ I understand that in further work (not yet published) Mr Khamis has obtained a closer fit than the 
above by using the odd values 5 and 7 of n and choosing the ratio h/o empirically (with the object of 
covering the ‘essential range’). For n=5 he chose h/o = 1-4, while the second rule above gives 
V'é = 2, and for n =7 he chose h/o = 1-2, while this rule gives ./42 = 1-225. 

¢ For the analogous argument concerning Sheppard’s corrections see Kendall (1943, pp. 69 and 70) 
and also Milne-Thomson (1933). 

§ Note (12 June 1950). The rest of this section, which originally only gave formulae essentially 
equivalent to (7-2) and (7-18), has been greatly enlarged in order to meet a request from the referee 
for a more detailed discussion and for numerical examples, with special reference to errors in the tail 
groups for different values of n. Other changes made in response to this request are the additions at the 
ends of §4 and §7 (a) and an expansion of the formulae (8-6) together with the remarks accompanying 
them. 

|| 1947 ((23), p. 345, and (10), (11), p. 342); it may be noted that the factor R+1 in the last two 
formulae is presumably a misprint for R+ 2, since the Euler-Maclaurin formula has been applied to 
k(r, h, x) on the interval [a— 4h, b+ 3h], whose length is b—a+h =(R+2) h, not (R+1)h. 





Pp 





H. P. MULHOLLAND 83 


continuous everywhere and vanish outside the interval [—4n—}4, 4n+ 4], and that m is any 
positive integer ; then we have [cf. (2°12)] 


a: = in+1 as uy 
p*(x)— p(x) = Baas Ply) K,,(x, y) dy — p(x) 


intl 
= (=f) KO sy) dy, (78) 
by means of m integrations by parts, where 
(G/ey)™ Ki ™(a,y) = K, (x,y) (y+), (7-4) 
Qo \m-1 Kem y=z+0 1 (7 5) 
[(s) ila or as 


the function K‘-™(, y) being determined except for an added arbitrary polynomial of degree 
less than m by (7:4), (7-5), and (when m > 1) the further condition that (0/dy)"-? Ki ™(z, y) 
shall be continuous everywhere. From (7-3) we can get, first, by Schwarz’s inequality 
(cf. Cramér, 1946, §9-5, p. 88), 


in+1 + in+1 + 
\pe)—pei< (J (Keeway) (["" wmunitdy). 78) 
The first integral on the right can be minimized by choosing the arbitrary terms in K{y ™(z, y) 
so as to make this function orthogonal to 1, y, ...,y™—! on [— 4n— 1, 4n + 1] (ef. Szegs, 1939, 
Theorem 3-1-1, p. 37). Secondly, let us make the additional assumption that P™+»(x) is bounded 
and integrable on [—4n—4, 4n +4]. We shall write p(x) for the density P’(x) and put 


M,,, = max | p™(x) | = max | P™+(z)|_ (|x| <4n+4). (7-7) 
Since p(x) vanishes by hypothesis for | x| > 4n +4 we have 
y 
| pry) | = J p(x) dx 
y 


by (2-2), where [y’, y”] is the common part of [y — }, y+ 4] and [— 4n—4, }n+}] and is thus 
of length min (1, 4n+1—|y|). From (7-3) and (7-8) we get, for every x, 





< M,, min (1,4n+1-—|y]), (7-8) 


in+1 
|P*(e)—Ple) | < My |” ‘| KS" (@,y)| min (1, n+ 1 =| y) dy 


= M,,Com(X), (7-9) 


say. While it is not practicable to minimize the integral in (7-9), yet for a given 2 we can choose 
the arbitrary terms in K‘$>™(z,y) so as to make m of its zeros fall at suitable points in 
[—4n—1, 42+ 1]. Such a choice also reduces the labour involved in calculating the integral ; 
even so, the computation is heavier than for the first integral in (7-6). The advantage of (7-9) 
over (7-6) is that supplementary information about the density p(x) is required by the former 
only in a comparatively simple form [indeed, in the form familiar in remainder terms of 
formulae for approximate quadrature of p(z)]. 

As an illustration of the calculation of C,,,,(z) we take the case n = 6, m = 4, x = 0. We 
obtain, successively [cf. (2-13)], 


36K,(0,y) = 361 (y) = (1—y?) (4—y*)(9—y?) = 36 — 49y? + 14y4 — 98, 
3780K$ (0, y) = 3780y — 171 5y? + 294y5 — 15y? — 1890sgn y, 
181440K¢-%(0, y) = 8544y2 + 7560y4— 686y° + 42y8— y29— 15120] y |. 
6-2 











84 Hartley-Khamis solution of the moment-problem 


The coefficients of y°, y?, y and 1 in the last function were chosen to make it vanish at y = 0 
(twice) and at y = + 2; its remaining zeros in [—4, 4] were found to be where | y| = 1-187, 
2-289, 3-520 or 3-967; elementary evaluations then yielded (cf. (7-9)) 
Cz, 4(0) = 0-0268. (7-10) 
Again, the arbitrary terms in K{-®( + 3, y) and K{- + 2, y) having been chosen so as to make 
these functions vanish where y + 4 = 0 (doubly), — 2 or 2, similar calculations yielded 
Co,( +3) = 0-00130, OC, 4( + 2) = 0-00210. (7-11) 
These correspond to the extreme tail groups for the cases n = 6 and n = 4 when the grid is 
in its central position. 
As a simple test case for the effectiveness of (7-9) let us consider the Type II distribution 


with density p(a) = (315/256) a-(1 — 22/a?)4. (7-12) 
Then, P,(u) denoting a Legendre polynomial, 
| p'¥(x) | = (315/256) a-. 24.4! | P,(a/a) | <945/(2a5) (|x| <a), (7°13) 
since | P,(u)| <1 for | w| <1 (cf. Szegé, 1939, (7-21-1), p. 159). We have also P,( + 1) = 1 and 
as M, = M,(a) = 945)(2a5); = 
M,(2) = 14-766, M,(2-5) = 4-838, M,(3) = 1-945, M,(3-5) = 0-900. 


In a case of this sort Hartley & Khamis (1947, §4) would arrange the scale so as to have 
a = $(n+1); thus, with n = 6 and a = 3-5, (7-9)-(7-14) yield 

| p*(0)— (0) | <0-0242, | p*( + 3) —H( + 3)| <0-00118, (7-15) 
whereas the actual values of the differences here are 0-00205 and — 0-000083, respectively, 
and p(0) = 0-34217, p( +3) = 0-004534. For n = 4, a = 2-5, we obtain 

| P*( + 2) —p( + 2)| < 0-0102, (7-16) 

whereas actually the difference here is — 0-00024 and p( + 2) = 0-01960. As a second test case 
we may consider the Type I distribution with p(x) = C(1—?/a)*(1+2/a). It turns out that 
M, is then exactly twice as large as for (7-12), and the bounds in (7-15) and (7-16) would 
therefore be doubled; however, 7(3) for n = 6, a = 3-5, and p(2) for n = 4, a = 2-5, would be 
nearly doubled also, though 7(— 3) [or p( — 2)] would be much smaller than before. 

The values of C,,,,(x) in (7-10) and (7-11) correspond to the central position of the grid, but 
can be used also when the grid is shifted for purposes of interpolation, since we may 
(equivalently) shift the distribution instead of the grid provided that we start now with 
a P(x) that is constant outside [ — 4n, $n]. For the example (7-12) we now take a = 3n instead 
of a = 3n+4}, with a consequent magnification of M, [cf. (7-14)]; in this way we can obtain 
from (7:11)-(7-14), when n = 6, a = 3, 

| P*(x)—P(x)| <0-0026 (—3<a#<2), 
this interval covering one-sixth of the range and actually containing probability 0-0090, and, 
when n = 4, a = 2, | P*(x) —P(x)| <0-0310 (-2<x<}), 
this interval covering a quarter of the range and actually containing probability 0-0492. 
Presumably, better results could be obtained if values of C,,,,(2) for non-integral values of 
x + 4n were calculated specially, since then we could still take a = 4n + }. 


(d) Magnitude of error: second method. We have been supposing, so far, that our additional 
information about the required distribution was in terms of P(x) or P’(x); however (as will 
often be the case with sampling distributions), this information may relate to the characteristic 








a” i, 


ty=0 
; 1-187, 


(7-10) 
o make 
d. 

(7-11) 
grid is 


bution 
(7-12) 
(7-13) 
- 1 and 
(7-14) 


o have 


(7-15) 
tively, 


(7-16) 
st case 


ut that | 


would 
uld be 


id, but 
6 may 
Ww with 
nstead 
obtain 


), and, 


0492. 
lues of 


itional 
as will 
eristic 








H. P. MuLHOLLAND 85 


function (ch. f.) d(t). Formula (5-1) may then be of use, since we can deduce from it a formula 
for d(t) — *(t), where $*(t) is the ch.f. for P*(x). In fact, from the definition of V,,(x) [cf. §4, 
(I)] it follows that the ch.f. corresponding to it is (sin 4¢)"/(4¢)" (cf. Uspensky, 1937, p. 305); 
hence that for V{(x) is (it)’ (sin 4¢)"/(4t)", and so, by (4-4), 


(4) — 5" ? + ‘ 717 
O°) = “Gow Dalit (7-17) 
Let us introduce the functions 7*(¢) and R,,(t) by bane 
(H/sin 4)"{4(0) — 6*(0} = 1°) — Daylity = Bld). (7-18) 


Since (cf. §3) the moments of orders not exceeding n are the same for P(x) and P*(x), the 
Maclaurin series for ¢(t) and ¢*(#) must agree until after the term in ¢”; hence R,,(t) is simply 
the remainder after the term in t" in the Maclaurin series for 4*(t) = (4t/sin $t)" d(t). Similarly, 
if ,(t) is the ch.f. corresponding to P,(x), we obtain [cf. (4-8) and (4-10)] 


(dt/sin $4)"*7 {9 (t) — 4 (t)} = 74(¢) — 3 elit? = a(t) = (4t)"** pall), (7-19) 


say, where r,,(t) is the remainder after ¢” in the Maclaurin series for 7,(¢) = (}¢/sin $t)"*1 ¢(¢). 
Now [ef. (2-2)] (x) = éP(x), and so, if we write D,(x) = dP,(x), we shall have the formula 


Bo) ~Pale) = 5 [TP aeisey— salted (7-20) 


which holds wherever }(x) (and therefore a the left-hand side) is continuous, and also 
a similar formula for P(x) —P(—2x)—{P,(x)—P,(—2)}, with sin at instead of (sin }t)e~™ in 
the integrand (cf. Cramér, 1946, §10-3, p. 93). From (7:19) and (7-20) we may obtain the 
following uniform bound for the left-hand side of the latter: 


|B) -Pale)| <3 [” | Feo setulae 








<s[_ pee Lendl | ce [ae 


|¢|>2n 


1 | sin }t | 
— t) | dt 
27 J \t\>a” | Pat) | 


=1+h+4;, (7-21) 
say. The range [ — 27, 27] has been separated out because, in general, the series for 7,(¢) will 
not converge outside it owing to poles of 7,4(¢) at ¢ = +27. Clearly, 





I< z 2° | c,| S(n+2,n+2—r), (7:22) 
r=0 


say, where the c,’s can be calculated from the known moments by (4:8) and (4-5), while 
S(k, l) is an integral that (when k is even) can be reduced to cosine integrals and so evaluated 
accurately, but which in most cases can be overestimated with little loss by the formula 


S(k,l) = ={" mere (-["sin+oa0) = {°S) (k>0,1> 1). (7-23) 


This inequality is obtainable by combining the cases a = 0, g(9) = (j7+0)7(j = 1,2,...) of 
the inequality x— l na n-a 
i) sink 09(0)d0< (— = i) sin 046) ( i) (0) a6) (7-24) 














86 Hartley-Khamis solution of the moment-problem 


which is valid when 0<a< 4m, k>0, and g"(0)>0(a<0<m2-—.«), in virture of Tchebycheff’s 
inequality for integrals (cf. Hardy, Littlewood & Pélya, 1934, Theorem 236, p. 168), since 
sin* 6 and g() +g(7 —9@) are respectively increasing and decreasing for a <0 < 4m. 

While the methods to be chosen for overestimating the terms Jf, and J, in (7-21) must 
depend upon ¢(é), and in general the complex integral form of the remainder r,,(t) in (7-19) may 
be needed, a comparatively simple treatment will suffice if ¢(t) is of a suitable special type. 
For instance, suppose that (i) the required distribution is symmetrical, (ii) the Maclaurin series 
of the function y,(t) in (7-19) has all its coefficients non-negative,} and (iii) this series is con- 
vergent for | t| < 27. We suppose that a convenient explicit formula is available for 4(t); and 
also, for definiteness and slightly greater simplicity, that n is even. Then, first, A(t) —$,(t) 
is an even function of ¢ and for |t|< 2m is non-negative; further, p,(—t) = p,(t), and, for 
0<t<2z, p,(t)>0, p,(t) > 0, and p(t) > 0. Thus we can overestimate the parts of J, by using 
the inequalities 


0<p,(t)<p,(47) (0<t<4n), O<htp,(t)sin™ti<d(t) (7<t<2m), (7-25) 


( | e | ") sin"#? jt | p,(t) | dt < 2Pald™) + Pal) + $Pn(e7) | "sin™+29d0, (7-26) 
71 —in 7 7 in 

the latter inequality resulting from (7-24) with a = 47,k =n+2, g(9) =p,(20) =p,(t), 
followed by use of the trapezium rule to overestimate the integral of p,,(20) concerned. The 
values of p,,(¢) on the right-hand sides of (7-25) and (7-26) are obtainable by (7-19) from the 
corresponding values of ¢(t). The right-hand side of (7-21) can thus be overestimated by asum 
of expressions for which bounds are calculable when a formula for ¢(t) is known. The expression 
for P(x) — P(—x) —{P,(x) — P,(—2)}, mentioned above after (7-20), can be treated on similar 
lines, though preferably without immediate resort to absolute values in the integrands. 

To test the effectiveness of the programme just outlined some numerical results were 
worked out for the distribution of the sum v of m independent sample values of a variate 
u distributed with density }7sech?u. For v the ch.f. ist y(t) = (4¢/sinh 4). (It is possible 
to obtain a formula for the corresponding density, namely (when m is even), 


(d/dv)™[{(4m — 1)* + v®} {(4m — 2)? + v?} ... {12+ v?}] 3v coth mv/{(m— 1)}}, 


but this would be very awkward for computation except for quite small values of m.) It can 
be verified easily that if we change the scale so as to make the variance equal to (n + 1)/12 
(according to the second rule put forward in §7(b) above), then instead of y(t) we get 


B(t) = y{t(n + 1) mH = {4t(n + 1) m-¥ym/sinh™ {$t(n + 1) mH}; 


also that the conditions (i) (ii), and (iii) above are fulfilled if n + 1 <_m. (Indeed, the condition 
in the footnote to (ii) is fulfilled.) The numerical results obtained with m = 10, n = 4, 
o? = 5/12, were as follows: 





| P(x) — Py(x) | <0-0074 (—co<a%<oo), (7-27) 
Pu(h) = 0440190, (3) = 0-058464, D,($) = 0-001346; 
| P(S) — P(— 3) — {Pal®) — Px — $)} | < 0-0053; (7-28) 


1 —{P,(3) — P,( —3)} = 0-021069. 


¢ This holds, in particular, when the cumulants k,, K,, K,, ... for P(x) exceed, respectively, those for 
Vn4i(2) by numbers that are alternately non-positive and non-negative. 
} For relevant formulae see Campbell & Foster (1948, formulae 614, p. 70, and 607-0, p. 68). 





heff’s 
since 


must 
) may 
type. 
series 
} con- 
; and 
Px(t) 
1, for 
using 


7-25) 
7-26) 


On(t) 
~The 
n the 
}sum 
ssion 
milar 


se for 





H. P. MuLHOLLAND 87 


In the case m = 10, n = 6, o? = 34, the results were 

| P(x) —B,(x)| <0-0031 (—co<a<0o), (7-29) 
Px(4) = 0405585, P,(3) = 0-089629, p,(§) = 0-004726, H,(Z) = 0-000060; 
| P(2)—P(—2) —{P,(2)— P,(—2)}| < ‘ee 
1 —{P,(2) — P,(—2)} = 0-009572. 

It may be recalled from §4 (II), that, when n is even, P*(j) = P,(j) and p*(j +4) = Dy(j + 4) 
for j=0, +1, +2,...,andalso, by (4-10), P,(z,n) = P*(x,n + 1) forasymmetrical distribution. 
(e) Magnitude of error: summing-up. Of the formulae obtained in §7 (c) and (d), we may 
say that formula (7-9), which requires knowledge of a bound for | p™(x)|, provides a means of 
proving, in suitable cases, that the Hartley-Khamis estimates are at least rough approxima- 
tions to the true probabilities (though possibly very rough for the extreme tail groups), and 
that formulae (7-19) and (7-21) can be made to yield, in cases where the characteristic 
function is known explicitly and is of a suitable special type, a fairly good uniform bound for 
the (absolute) errors in the (modified) Hartley-Khamis estimates p,(x). Both methods are 
susceptible of refinement to some extent, and in the numerical examples worked out both 
give better bounds when the number of moments is increased. Further, in these examples 
the bounds obtained for absolute errors in the tail groups are smaller than for those in the 
central group, although the bounds for relative error are larger jcf. (7-10), (7-11), (7-15) and 
(7:27)-(7:30)]; comparison with the known true values available in the example (7-12) shows 
that the same relations obtain when we consider actual errors} instead of bounds for errors. 


(7-30) 


8. CoMPARISON WITH THE GRAM-CHARLIER TYPE A SERIES 


Let us consider a Type A series for P(x) carried as far as the known moments (of orders not 
exceeding n) permit; we shall write 


Pp(x\ = 3 d,U%(2), d, = | ” ily) dP(y), (8-1) 
r=0 —o 
U(x) = ®(z/0'), h,ly) = (—0' Hylylo’)|r! = (—d]dy)"-" h,ly), (8-2) 


where ®(u) is the reduced normal c.d.f. and H,(u) the Hermite polynomial of degree r. (For 
the relevant formulae cf. Cramér (1946, §17-6, pp. 221-3); also Kendall (1943, §§6-20-6-23, 
pp. 145-8).) Further, in the reduced case where P(x) agrees in mean and variance with 
U(x) (so that w; = 0,4, = 0? = o’*) we have, on writing y,_, = «,/o" (r = 3,4,...) for the 
reduced cumulants, 
d= 1, dj =d, = 0, d, = —y,0°/3!, d, = y,04/4!, ds = —y,0°/5!, dg = (¥,+ 10y})0*/6!, .... 
(8-3) 
We now note first that there is a formal analogy between the formulae (8-1) and (8-2) for 
P,(x) and the formulae (4-8) and (4-5) for P,(x), with U(x) corresponding to V,,,(%) and 
o’"H,(y/o’) to y. Moreover, both P(x) and P,(x) are fitted by moments (cf. §6), and both 
expansions involve a normalized biorthogonal system of functions of the type (3-5) (although 
in the case of P,(x) there is no corresponding set of orthogonal polynomials). In the reduced 
case for both P,(x) and P,(x) (where ~, = 0 and (n+ 1)/12 = o? = o”?) we have c, = 0 = dj; 
and we find that w,(y) =h,(y) and w,(y) =h,(y) (so that c. = d, and c, = d;), while w,(y) —h,(y) 


t The numerical examples given by Hartley & Khamis (1947, §§4 and 7) also exhibit these relations 
between actual errors. See also the remarks following (8-6) below. 








88 Hariley-Khamis solution of the moment-problem 


is of degree only r— 4 when 4<r <n; there is also a fairly close mutual approximation between 
the two leading terms V,,, ,(7) and U(x) when we take 0’? = (n + 1)/12, even for not very large 
values of n (cf. Uspensky, 1937, p. 305, and Haldane, 1945). 
Turning to the expansion of P*(x) we note that (4-5) yields easily the relation 

w,(y) = (—d/dy)""'w,,(y). (8-4) 
We now have an analogy between the formulae (8-1) and (8-2) for P,(x) and the formulae 
(4-4) and (8-4) for P*(x) like that already found for P,(x) and P,(x). In the reduced case for 
both P,(x) and P*(x) (where ~; = 0 and n/12 = o? = o”*) we have a, = 0 = d,, a, = d, and 
a, = d,. The mutual approximation between the leading terms, ia this case O{a(n/12)-*} and 


V,(x), is, of course, slightly less close than that between O[a{(n + 1)/12}-*] and V,,,(x) noted 
above. 


To illustrate the numerical resemblance between P,(x) and P,(x) we consider the case 
n = 5, wy = 0, 0? = o”* = (54+ 1)/12 = 4; then, by (ID), 
P*(k) =P,(k) (k= 0, +1, +2, +3,...). 
Using these relations and the notations above we find 
P*(0)— P,(0) = 10-9(7-564y, — 1-1347ys), 
P*(1)— P,(1) = 10-9(0-1774 — 4-819, + 187277. + 0°2226y,), 
P*(2)— P,(2) = 10-9( — 0-0918 + 1-296y, — 0-9026y, + 0-3162y,), 
P*(3) — Po(3) = 10-3(0-0110 — 0-140y, + 0-1309y,— 0-0904y,). 
(The results for s = — 1, ~2 and —3 may be obtained from those for 1, 2 and 3 respectively 
by changing the signs of the terms not containing y, or y;.) In particular, if 
ly¥,|<0-1 (r = 1, 2,3) 
these differences are less in absolute value than 0-00087, 0-00087, 0-00035 and 0-00004 
respectively; for a symmetrical distribution these figures can be replaced by 0, 0-00037, 
0-00019 and 0-00003 respectively. For comparison, when n = 7, 4, = 0,0? = o’? = (1-2)-? 
(which approximates to (7 + 1)/12), and when also the required distribution is symmetrical, 
we get P*(0) — P,(0) = 0, ) 
P*(1)—P,(1) = 10-9(0-008 — 0-063y, — 0-298y,), 
P*(2)—P,(2) = 10-9(— 0-001 + 0-041y, + 0-227y,), 
P*(3) — P,(3) = 10-9( — 0-002 — 0-062y, — 0-014y,), 
P*(4)— P,(4) = 10-3(0-001 + 0-016y, + 0-008y,). | 
If | y,|<0-1 (r = 2,4) the last four differences are less in absolute value than 0-000044, 
0-000028, 0-000010 and 0-000003 respectively. It is clear that by bringing in the sixth order 
moment we shall usually get a considerably closer agreement between the two estimates. 
Moreover, both for n = 5 and for n = 7, the differences are in general smaller in the tails than 
in the middle, although, of course, the relative discrepancies in the tails may be greater. 
The numerical approximation of P*(x) (or of P,(x)) to P,(x) (with appropriate o’) appears 
to be moderately good when P(z) is close to either of the leading terms in the corresponding 
expansions; however, there are cases, including the first numerical example given by 
Hartley & Khamis (1947, §4), in which this approximation is less close than that of P*(2) to 
P(x). Such cases thus belong to the class of distributions for which the Hartley-Khamis 
procedure works better than the Type A expansion. It would seem that information about 


(8-5) 


~_ 


(8-6) 








Q 
a 


ane2mittzaa 


(8-5) 
ively 


0004 
1037, 
+2)-2 
‘ical, 


(8-6) 


1044, 
der 
ates. 
than 


ears 
ding 
1 by 
rt) to 
amis 
bout 





H. P. MuLHOLLAND 89 


the extent of this class should be the first objective of further research on the procedure. 
In the meantime, in view of the formal analogy between the Type A series (8-1) and the 
expansion (4-4) of P*(x) we may conjecture that a distribution will belong to this class when 
its c.d.f. P(x) is closer to V,,(x) than to ®{a(n/12)-*}. Owing to the properties of V,, (2) noted in 
§4 this will often happen when the distribution in question has a finite range and is slightly 
platykurtic. (The numerical example just quoted is of this type; itis one of Hartley & Khamis’s 
most successful applications of their procedure.) Similar remarks apply to P,(2). 


REFERENCES 


CAMPBELL, G. A. & Foster, R. M. (1948). Fourier Integrals for Practical Applications. New York: Van 
Nostrand Co. 

Crame&r, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton University Press. 

Hapang, J. B. 8. (1945). Phil. Mag. 36, 184-5. 

Harpy, G. H., Lrrrnewoop, J. E. & Pétya, G. (1934). Inequalities. Cambridge University Press. 

Hartiry, H. O. & Kwamis, 8. H. (1947). Biometrika, 34, 340-51. 

KENDALL, M. G. (1943). The Advanced Theory of Statistics, 1. London: Charles Griffin and Co. 

MrtnrE-TxHomson, L. M. (1933). The Calculus of Finite Differences. London: Macmillan. 

SzzG6, G. (1939). Orthogonal Polynomials. New York: American Mathematical Society. 

Uspensky, J. V. (1937). Introduction to Mathematical Probability. New York and London: McGraw- 
Hill Book Co. 











[ 90 ] 


ESTIMATION PROBLEMS WHEN A SIMPLE TYPE OF 
HETEROGENEITY IS PRESENT IN THE SAMPLE 


By W. M. LONG 
Central Research Establishment, National Coal Board. 


Introduction. This paper is concerned with a proLiem which the writer has often en- 
countered in practical work; the following are three typical examples: 

(i) Several operators each make a number of determinations of some physical constant; 
it is observed that there is an ‘operator effect’, i.e. that the sets of results obtained by the 
different operators disagree to an extent which, although small, cannot be dismissed as 
accidental. It is desired to estimate, and set confidence limits for, the ‘true value’ of the 
constant. 

(ii) A certain characteristic of the output of a factory is being examined. For various 
reasons it is impossible to examiné the whole of the output, and instead a number of samples 
are taken on each of several days. A significant day-to-day variation is in evidence; this 
variation appears to be quite random. Confidence limits are required for the true average 
value of the characteristic. 

(iii) A certain type of house is being erected in varying numbers on several different sites, 
the time taken to erect each house being observed. It is known that the physical conditions 
on a site affect the erection times, and that these conditions vary considerably from site to 
site. It is desired to estimate the average time which would be observed, in the long run, on 
the average site. 

The common feature of these problems is the complexity of the sample, wherein the 
elements fall into a number of groups, each group being associated with a different set of 
physical conditions (in example (iii), for instance, each site comprises one set of conditions). 
The assumption is made that the sets from which the available information is derived con- 
stitute a sample from a population of such sets, and the problem is to estimate the mean of 
the population which is made up from all the possible observations which could arise from 
all the possible sets. 


1. In mathematical terms, we may sum up the situation as follows: 

We are given a sample O of values of a variate, O being composed of k subsamples 
o;(t = 1,...,k); systematic differences of some kind exist between the subsamples. It is 
required to estimate the mean of the population from which O is drawn. 

Any valid solution must depend on what we know, or are prepared to assume, about the 
parent population from which the observations are drawn, and in particular about the 
nature of the systematic differences between the subsamples. The assumptions upon which 
most of the investigation which follows is based are: 

(i) That the subsample o; may be regarded as a random sample from a normal population, 
p; say, with mean a; and variance 0. o*is the same for all the subpopulations p,, but a; varies. 
The systematic differences between the subsamples are to be ascribed to the differences 
between the a,’s. 

(ii) That the set of a,’s associated with those p; which have contributed toO may be regarded 
as a random sample from a normal population with mean a and variance o%. 

Our problem then is the estimation of a, and we are not interested in the subpopulation 
means a; except in so far as they will assist in the solution. 








ne we eee ae ee 














° 
'y 





ee ~> SE: 








———————————— aime ET 





W. M. Lone 91 





It is not, of course, asserted that these assumptions will always apply to such physical 
situations as we have described above. They are, however, of a type that will often be made, 
especially when not much information is available. 

Certain relaxations in the assumptions will be considered at a later stage; for the present, 
however, they will be retained. 


2. pant by 2,; the jth element of the subsample 0;; let the number of elements in o, be 


n, and let £ n; = N,so that N = the total number of observations in O. Let %; = z Xi; / n; 
=1 


and % = 5 > 5 ay|. 


i=1 j= 

Consider for the moment a single subsample o,. The probability of obtaining the observed 
set of x’s is the probability of obtaining a; on the hypothesis of random sampling from the 
population N(a,0%)* times the conditional probability of obtaining those x’s by random 
sampling from N(a;,0*), summed over all the possible a,’s, i.e. is 

. sae ag 1((a,—a)? | & (ty—- | il 
Hnj+1) 1 te, Se pan a 
hi _ tem) 1 ovia,} exp 2 oF +2 : o II dx,;da,. 

Carrying out the necessary integrations of this type we find, after some algebra, that for 

the whole sample O the probability element is 


k -t k ne = _»\8 ni 
i=1 J 21 (7 5=4 


Got ot) aH es 

It will be observed that the mean and variance of each subsample are distributed inde- 
pendently of one another and of all the other subsample means and variances; the probability 
element can obviously be expressed as the product of factors of the required type. 

When the 7,’s are all equal the estimation problem offers no difficulty. If we put n; = n 
for all 7, the set of means 7,, ...,%,, may be regarded as a random sample from the population 
N(a, 02 +. 0?/n); it follows that the statistic 


sa (%—a) 


k i 
Pa —%)*/k(k — 1) 


is distributed as Student’s t with (k— 1) degrees of freedom. It may readily be verified that this 
is the solution reached by application of the likelihood ratio technique. 

In particular, if n; = 1 for all i, we have a sample of & observations each of which may be 
regarded as a more or less inaccurate estimate of the corresponding true variate-value a,;, and 
we have the corollary that the Student t-test is still exact when the observations on which it is 
based are inaccurately measured, provided that the errors of measurement have no systematic 
bias and all follow the same Gaussian law. 





3. We now turn to the case where the 7,’s are not all equal. 
The maximum likelihood method does not now lead to any practical result. We find for 
the estimator of a the weighted mean 


k k 
S (624-84/n,) 2, / E (62+ 34)n)-. 
i=l t= 


* Here and below this notation is used to indicate a population represented by a normal 
distribution with mean a and variance 0%. 








92 Estimation problems when a simple type of heterogeneity ts present 


In numerical cases the weights have to be obtained by the solution of a set of three simul- 
taneous equations, two of which may be of degree up to the 2kth in the unknowns. We shall 
not, therefore, consider the maximum likelihood solution any further, beyond noticing that 
k 
when the n,’s all become large the solution for a becomes @ = > %; / k, a form to which we 
i=1 
shall return later. 


The obvious alternative approach is to use some form of weighted mean of the observations, 


the system of weighting used to be decided by the precision of the estimator obtained and by 
its adaptability to interval estimation. 


On intuitive grounds, there seems no point in using different weights for different observa- 
tions from the same subsample; this means, in effect, that we need consider only weighted 
means of the subsample means, which simplifies matters. 


k k 
4, The ‘grand mean’, 7 = > ;%, / =%,= bs z mis / Nis at first sighta possible alternative 
i=1 0 i=l 


estimator. Difficulties 7. however, when its use for interval estimation is considered. 


Its sampling variance is (= rn / N ) o2 + o0?/N, which can be estimated from the sample using 
a linear combination of sums of squares of the form 

k ik eéeid: kom 

b, > 0, (%;—2%)9/(k-1) +b, Y Y (ej —-%,)?/(N —h), 

i=1 i=1 j=1 

where the 6’s are functions of the n,’s. 6, is negative, being equal to 
2 k 
k leo 73 i. 
w(ve— Eni) 


i=1 
k 
Moreover, the sum of squares > n,(7,—7)* is not distributed as ?. 
i=1 


5. An exact solution of the interval estimation problem, as the above examples indicate, 
appears to be a matter of some difficulty. There may, indeed, exist some system of weighting 
the Z,’s irom which one could derive a Studentized deviate whose sampling distributicn 
would not be too unmanageable, but the writer has not discovered one. Instead, an approxi- 
mate solution is suggested, based on the unweighted mean of the subsample means, which 
we shall denote by . 

~ daft 


The sampling variance of Z’ can be shown to be 
eo ot 1 
kT BAR, 
This variance is readily estimated from the sample, since the unweighted ‘ between-means’ 


k 
variance >, (%,—Z’)*/(k—1) has expectation 
i=1 











imul- 
» shall 
t that 


ch we 


tions, 
nd by 


erva- 
yhted 


ative 


lered. 


using 


icate, 
hting 
uticn 
OXxi- 


which 


eans 








W. M. Lone 93 
The solution is obtained by defining, by analogy with Student’s ratio, a statistic 
(z'—a) 





and assuming as an approximation that ?’ is distributed as Student’s ¢ with (&— 1) degrees of 
freedom. 

In attempting to justify this procedure, we have to consider first the efficiency of Z’ and 
secondly the closeness of th spproximation. 

At first sight Z’ may appear a rather crude estimator of a, since it takes no account of the 
differences in the subsample sizes. Closer examination, however, reveals some points 
favouring its use. 

First, as we have already noticed, when the n,’s all become large it tends to coincidence 
with the maximum likelihood estimator. Secondly, although the grand mean % may appear 
a more natural choice on intuitive grounds, %’ will often be the-more efficient of the two. 
For if we subtract the sampling variance of z’ from that of Z we get 


k 

y n3 

rex 1 .. coe & N? eh Let 

alie-t(-as] be PTE 
The term in the first bracket is always positive; that in the second is always negative, from 
the property that the arithmetic mean of a set of positive numbers, not all alike, is greater 
than their harmonic mean. Which of the two estimators is the more efficient depends therefore 
on the values of the n,’s and of the ratio o?/o?, in any particular case. For a given set of 
n,’s, the smaller o is compared with o?, the more likely it is that Z’ is the better estimator. 
As a numerical illustration a few examples are given in the tables below of the efficiencies 

of both z’ and Z for some sets of n,’s and a range of values of the ratio o?/o2. There is a point 
of some difficulty here, namely, how, in terms of the variances o? and.o, efficiency should 
be defined. The normal method of obtaining the asymptotic or minimum variance for 
efficient estimators, by finding the Hessian matrix of the likelihood function, may not be 
valid, since the subsamples vary in size. Nor is it necessarily a to proceed by finding 


the values of c; which minimize the variance of the linear function be C,%; /= E ¢,; if we do so, 
subject to the condition c; = 1, we finde; = (02 + 0?/n,)- |x > (02+ 07/n,)-, giving for the 
i=1 i=1 


k “4 
minimum variance { } (02+ atin.) = J, say. This is appropriate when the ratio 0/02 
i=1 


is known, which is not the case here. Evaluation of the Hessian matrix leads to the same 
value }), and we have therefore used it; the comparison of the variances of 2’ and Z will still 
be valid, and the values given for the efficiency of z’ will, if incorrect, presumably be too low. 
The abbreviation ‘Ef’ is to be read ‘Efficiency of’. 
The efficiencies to be expected with larger sizes of subsample may be judged simply by 
adjusting the values of the ratio o/c? in Tables 1-4; for, writing o?/o?, = r, we have 


m= (Seaver) (RoR SAN -(B(0) TUae a 


i=1% i=1 ka i21 n. 








94 Estimation problems when a simple type of heterogeneity is present 


Pa wilt 
k r\-)\1 zm r a ie 
and, similarly, Ef (Z) = , (: + =) NE + w) so that multiplication of r and each 
i=1 


of the n,’s by the same scalar quantity does not alter the efficiencies. 





Table 1. k = 5, n; = 1, 2, 2, 2,5, respectively,i = 1,...,5 




















































































































o*/o3 0 0-1 1 5 10 100 « 

Ef (2’) 1 0-999 0-974 0:887 0-845 0-781 0-772 

Ef (%) 0-758 0-774 0-864 0-964 0-986 1-000 1 
Table 2. k = 5, n; = 2,4, 5, 8,10 

o/c? 0 0-1 1 5 10 100 co 

Ef (2’) 1 1-000 0-988 0-915 0-863 0-754 0-734 

Ef (2) 0-805 0-812 0-862 0-946 0-974 0-999 1 
Table 3. k = 5, n; = 1,5,7, 15, 25 

o*/o2 0 0-1 1 5 10 100 oo 

Ef (%’) 1 0-999 0-944 0-720 0-596 0-368 0-325 

Ef (%) 0-607 0-621 0-699 0-833 0-898 0-995 1 

Table 4. k = 10, n; = 1, 2, 2, 2,2, 4,5, 5, 8,10 

o*/o? 0 0-1 1 5 10 100 roe) 

Ef (Z’) 1 0-999 0-970 0-843 0-773 0-650 0-629 

Ef (2) 0-681 0-695 0-785 0-921 0-964 0-999 1 



































The importance of the ratio o?/o? in deciding whether Z’ is a better estimator than z might 
have been expected; for when o2 rather than a? is the important component of variance we 
have then, virtually, a random sample of & observations a,...a, from the population 


k k 
N(a,o%) and the mean % approximates to > n,a, / = ”,, which is less efficient than the 
i=1 i=1 


k 
unweighted mean > a; / k. At the other end of the scale when o? is negligible compared with 
i=1 


o* we have, effectively, N random observations from N(a, c?), and a similar situation occurs 
with the roles of the two estimators reversed. 





A 





each 























W. M. Lone 95 


6. It seems clear that under favourable circumstances the use of Z’ can be recommended. 
We now attempt to justify the statement that ¢’ is approximately distributed as Student’s ¢. 

For our present purposes the sample is to be regarded as consisting of observations7,, ...,Z,, 
each normally distributed about a, with variances respectively 03 ... o?, where a? = 02+ 0°/n, 
(i = 1,...,4). The probability element associated with it is 


1 & (%,—a)? 
ak —1 oe aie 4 
{(27)#* o, ...0,}-lexp 5 = a 





di, ... dy. 


The observations %,...%, may be taken as the co-ordinates of a point, P say, in a 
k-dimensional Euclidean hyperspace. Let us assume, as we may do without loss of generality, 
that a is zero. Consider the line OQ through the origin of co-ordinates O which makes equal 
angles with all the axes, i.e. whose direction cosines are (1/,/k, ..., 1/,/&). It may be shown (see, 

k 


-+ 
for example, Kendall, 1943) that if we define a statistic z = z'| x (%; -7')\ , then 
i=1 


z=cot¢, where ¢ = angle POQ. 


Also z = t’/,/(k—1). z is constant over the cone obtained by rotating OP about OQ, and its 
probability distribution will be given by integrating the sample probability throughout 
the volume between the cones defined by z+ $dz and z— 4dz. 

If we apply the transformation Z; = o;£;(i = 1,...,k), the cone corresponding to any 
particular value of z is transformed into a £-cone, and the probability density at the point 
P is now proportional to exp(—40P’*), where P’ is the transform of P. The probability 
{| z| >| zp |} is therefore proportional to the integral of exp (— 40P’*) throughout the volume 
enclosed by the £-cone corresponding to 2», the Jacobian of the transformation being 
a constant. Now exp (— 40P”) is constant over the surface of the (& — 1)-dimensional hyper- 
sphere with centre O and radius OP’, and the volume cut off on this hypersphere by the 
£-cone is equal to OP’*-» times the solid angle, w say, of the -cone. The probability 
{|z| > | 29 |} is therefore proportional to 


of OP"*-) exp (—40P"?) d(OP’), 
i.e. to w. . 

The problem of determining the sampling distribution of znowbecomes that of determining 
in terms of z the solid angle of the corresponding £-cone. 

This problem, while simple in statement, presents heavy difficulties for values of k > 3, due 
largely to the fact that the £-cones are not cones of revolution; their sections by hyperplanes 
at right angles to their principal axes are hyperellipsoids. 

When k = 2 the cones degenerate into pairs of lines in two dimensions, and we easily find 
that the required probability element is 


BB (#+1)d 
27 z4 + (f?—2)27+1’ 


40,0, 
oi+oy 


f(z) dz = 





where {= 





The sampling distribution of ¢’ is in this case obtained by writing ¢’ for z in the above. 
When o, = o, this reduces, as it should, to the Student distribution for one degree of freedom, 


viz. 
1 dt’ 
m1+t'2" 











96 Estimation problems when a simple type of heterogeneity 1s present 
The case k = 3 is rather more complicated. The equation of the z-cone, in the original 
ants, & (2j + 3+ T§)o — 2(%,%, + 7_,%3+Z%,) = 0, 


222-1 ¢/2-1 


where a = (3cos*¢—1) = wr. - ian 





and that of the corresponding é-cone is therefore 
(Ei + OFE3 + 0355) & — 2(07079 FF. + F273 babs + 7971658) = 0. 
Referred to the principal axes of the cone this takes the form 
A, X?+A, ¥?+A,Z* = 0. 


From the geometry of such cones we can always arrange matters such that A, >A, >0 and 
A; < 0 and the solid angle of the cone is then equal to 


nm - (W,\* (#7 (1—W,sin?6\* 

E-(n) Jo (i=weanea) 9): 
where for convenience we have written W, = (A,—A,)/A, and W, = (A,—A,)/(A,—A,). WH and 
W, are both less than unity, provided that A, and A, are not both zero (in which case the cone 
degenerates into a line and the solid angleis zero). Under these conditions the series expansions 
of (1 — W, sin? 0)* and (1 — W, sin? @)-* are uniformly convergent for all values of 0, and we may 
expand the integrand in a series in sin? and integrate term by term. To find the constant 
of proportionality by which the result must be multiplied in order to convert it into a prob- 
ability we note that when P{|z| >| z,|} = 1 the cone opens out into a plane and the solid 
angle then becomes 27. The required constant is therefore 1/(27). Making the appropriate 
correction we find that 


o 
P{\z|>|29[} = 1-(F) (1—(W,— Wi) b+ & (+ 3M) + why (W342, Wi + 573) 
+ rebsz (5W3+ 9W2W, + 15W, W3+ 35W}) +... ]}. 


An alternative form which converges more rapidly for low values of the probability is 


P{\z| >| 29 |} = -(F) { -3 (Ay) (a-mr-1-5 (Fg) a at |---}: 








this being obtained by writing the integral in the form 


beige) 





and expanding in a series. 
The A’s, and hence the W’s, are obtained by solving the discriminating cubic of the 
€-cone, viz. 
A3 — a0? + 03 + 73) A? + (a? — 1) (0203 + 03.03 + 02.7?) A— (a — 2) (a +1)? oR oF 03 = O. 


The probability obviously depends on the relative and not the absolute magnitudes of the 
o,’s. When they are all equal they may be equated to unity without affecting the result and 
the cubic then has roots («+ 1), (+1) and (#—2). W, and W, are now both zero and 


W/W, = Ayg[(Ay—As) = He+ 1) = 27/(1 +24). 


ET 








iginal 


0 and 


,, and 
) cone 
sions 
> may 
stant 
prob- 

solid 
priate 


W. M. Lone 97 
The probability now becomes 1 —z/(1+z*)# and the probability element is therefore 
dz 
f(z) dz= (+2) 
As may be seen by substituting t = ,/2z, this agrees with the Student distribution for two 
degrees of freedom. 


Table 5. k = 2, n, = 2,n, = 5 





































































































o*/o2 0 0-1 1 10 100 00 
P, for P, = 0:05 0-05 0-0500 0-0497 0-0472 0-0455 0-0452 
P, for P, = 0-01 0-01 0-0100 0-0099 0-0094 0-0091 0-0090 
oo? 0 0-1 1 10 100 oo 
P, for P, = 0-05 0-05 0-0500 0-0481 0:0386 0-0339 0-0332 
P, for P, = 0-01 0-01 0-0100 0-0096 0-0077 0-0068 0-0066 
Table 7. k = 3, n, = 2, n, = 3, ng = 4 
o*/o2 0 0-1 1 10 100 ee) 
P, for P, = 0-05 0-05 0-0500 0-0498 0-0486 0-0481 0-0478 
P, for P, = 0-01 0-01 0-0100 0-0100 0-0097 0-0095 0-0094 
Table 8. k = 3, n, = 2, n, = 5, ns = 10 
o*/o? 0 0-1 1 10 100 co 
P, for P, = 0-05 0-05 0-0500 0-0495 0-0446 0-0402 0-0394 
P, for P, = 0:01 0-01 0-0100 0-0099 0-0087 0-0077 0-0076 






































The probability functions found above are, unlike the Student distribution, dependent 
on the values assumed by the o;’s. For purposes of comparison, therefore, it has been found 
necessary to compute a number of probabilities for various sets of values of the o,’s. It has 
been found convenient to calculate the exact probabilities corresponding to the approximate 
ones given by assuming that ¢’ is distributed as Stuident’s t; we denote these quantities by 
P, and P, respectively. Some examples are given in Tables 5-8; in each case the values of 
the ratio o?/02 and of the n,’s, on which P, depends, are given. 

Biometrika 38 7 











98 Estimation problems when a simple type of heterogeneity is present 


The following points may be noted: 

(i) The tables will give probabilities for values of the m,’s other than those shown; Table 5, 
for instance, may be adjusted to apply to the case n, = 20, n, = 50 by multiplying the values 
of 7/02 by 10. 

(ii) As was to be expected, the errors involved in the approximation increase with increasing 
dispersion among the o;’s, whether brought about by an increase in o?/0% or by a widening 
of the differences between the n,'s. 

(iii) The exact probabilities shown are never greater than the approximate ones. This is 
certainly alwaystrue for the case & = 2, provided that the valueof t’ in question is numerically 
greater than unity. We mayshow this by considering the cumulative function of the distribu- 
tion, which is 


* tan (Sas TY 
7 o%+03t?—-1 


Then 1 ¥ et iJ _( 40102 |t'| ) 
P,-P,.= {ta an 3 — tan (eth a _ 








The expression in brackets is positive for |t’'|>1, since 2>40,0,/(0?+0%), and the 
result follows. 

Judging by the similarity of the pattern of the results for the case k = 3, a similar property 
would appear to hold there also; although no proof has been found, a few calculations with 
widely different values in the set of three o,’s which have been made confirm this conclusion. 

It appears that any test based on the approximation will be a conservative one, in these 
two cases at least. 

The closeness of the approximation is somewhat remarkable. There were, of course, 
a priori grounds for expecting good results when o? is small compared with c?, for then each 
%, is approximately equal to its subpopulation mean a,, and we have, virtually, a sample of 
k values from the normal population of a,’s. The above examples show that we can go rather 
further than this; that for 7? small compared with o2 the approximate percentage points are 
practically indistinguishable from the exact ones, and that even for o fairly large compared 
with o% the errors involved are not likely to be serious, from a practical point of view 
(a stronger objection to the use of the approximation would be the comparative inefficiency 
of Z’ in such a situation). 

To obtain some idea of the behaviour of the approximation for larger values of k, we may 
consider the moments of t’. Write 


t = (z'- o/(3, (2, — 2 P/k(k— vf} 


Bic Pia 
3 Ga) 





-( 7)'3 > a) 1+& 7 


k k 
where A = & d (&%- zy and is equal to {(k—1)/k} 5 o?. Expand ?’, t’, etc., in series form 
i=1 i=1 
and take expectations (the arithmetical labour in this is much reduced by the use of tables | 











ble 5, 
values 


passing 
lening 


‘his is 
‘ically 
tribu- 


d the 


yperty 
s with 
usion. 
these 


ourse, 
1 each 
ple of 
rather 
its are 
pared 
’ view 
ciency 


e may 





s form 


tables | 





W. M. Lone 39 
of symmetric functions such as those of David & Kendall (1949)). We find, writing 
k 
() = DoF (r=1,2,...), 
i= 


y(t’) = H(t’) = 0, 


matt) = 1—2(F) ap “e}*2 (Ea) ap 4+ (FE) O@+ Or 
5 


~‘te) a !208)+2(S~) +3 FE r-2(S) wer-Fer| 


“) ( aye 90(10) + 12(= -) 
151 


a PE) (6) 298+ 8 (=F) aye 2) + (15 — 302) (4) (2)° +7 (2) 


k- ) 








5k — i) (6) (4) 





(8)(2) +8 


eye 
rs 
—_ 
ck 





—8(— :) aye #80012) + 48( (10) (2) + 180 (= J ew-2(5*) (8) (2)? 


k 
+80 (“FS (6)?+ 20 (“Se 2") (6) (4) (2) + a alg i) (6) (2)8 
( 





k k 


eee ge oo 
2 a gms 


- Be) Heer +... 


This is a function of the k variances 7, ...,0?. Treating this set as a sample of o?’s, and 
writing A, for the ratio of the rth moment about the mean to the rth power of the first 


moment, i.e. setting k r 
2 
ae = le a eo 2 
A, k % Bs k 2 >) : 


we find that the expression for y(t’) falls into the form 


[1 =Aa| + E+ 3 (s+ (0 +3)24)} — ats * (12A,+3(v—5) 8+ 2(V4 10) A, +3(304 12)2,)} 


a) are 





“ tas # + (6A, + 8(57—31) AgAy + 12(y + 21)A, +9? + 28y— 114) AR 


+ 24(3v + 8) A, + 2(3r2 + 53y + 12) r,)} 


_(32(5y +12) | 


. 5 (480Aq + 180(0— 7) AgAg + 80(v — 8) AZ + 48(v + 36) A, 


+ 20(v? + 37 — 234) AgA, + 360(v + 6) A, + 90(v? + 5y— 34) AB 
3 4 2 


+ 20(v2 + 38y + 48) As + 5r(19v + 132) r,)} + 


where vy = k—1. Thus, to order v-?, 














100 Estimation problems when a simple type of heterogeneity is present 
In a similar way we find for the fourth moment of ¢’, to order v~, 


1 q , 18 , 84 6A, 6(15A,— 8A, + 6A3) 
ee ee ee 2 





32 GA, 6(15A,— 8A, + 6A3) 
(v—2)(v—4) vp y2 , 





Apart from the corrective terms A,, which depend on the relative variation among the 
o;’s, these are the moments of Student’s ¢ with v degrees of freedom. It appears, therefore, 
that if the o,’s are not very different the approximation should, for fair-sized v, be reasonably 
good. Indeed, the absence of a corrective term of order v~ in the second moment suggests 
that the approximation should improve, on the whole, with increasing v; for the o;’s are of 
bounded variation, since a2 < 0? < 0? + 0”, and hence the A’s, though not constants, cannot 
exceed certain limiting values. 

In view of the accuracies obtained in the cases v = 1 and v = 2 above, it seems clear 
therefore that the approximation is a reliable one. It is likely to go wrong only if, in addition 
to fairly large variations among the subsample sizes being present, the variance o° is large 
compared with o?. It is worth noticing, however, that only infrequently will situations arise 
where o? is known to be much larger than o2; not because they may not exist, but because 
they are difficult to detect. A large number of observations is required before the usual 
between- and within-groups analysis of variance will establish the existence of a o% which 
is small compared with o?. Here we may also note that if, in a practical situation where this 
approximation has been used, an error of judgement has been committed, in that o% has 
been taken to exist when in fact it does not, the consequences should not normally be very 
serious. As may be seen by putting o? zero in the above, the estimates used in the test are 
still valid, and the main effect is a certain loss in efficiency. 


7. The approximate solution just described applies to a rather wider class of problems 
than that defined at the outset. The essential steps in the derivation of the moments of 
t’, and of its sampling distribution in the two cases considered, do not require the variances 
o? to be of the form (02 + o?/n;), where o2 and o? are fixed. The distribution and moments 
are appropriate to the case of infinite repetition of a sampling procedure leading to k inde- 
pendent observations, normally distributed about the same mean, with variances respectively 
o?,...,0%. How the observations come to have these variances is a matter of indifference. 
In the case we have considered, where the observations are subsample means, it is not 
necessary that the two variance components o2 and oc? should be constant from one sub- 
sample to another, or that a? should be constant within any subsample; internal correlations 
between any number of observations within a subsample could exist. Physical situations 
where such complications might arise are not hard to imagine (e.g. in the example quoted 
above, where different operators each make a number of determinations of some physica! 
constant, each operator’s determinations might be made in sets of any number under 
conditions varying from one set to another, with consequent between-set effects). 

8. We conclude with an example of the application of the test to an analysis of variance 
problem. 

Suppose we have a 2 x m analysis of variance table, with unequal cell frequencies. Let the 
cell means be %;,,; respectively, where i = 1,2 andj = 1,...,m. Form the differences 





- the 
fore, 
ably 
rests 
re of 
nnot 


clear 
ition 
large 
arise 
AUS 
isual 
hich 
s this 
: has 
very 
t are 


lems 
its of 
2N.ces 
nents 
inde- 
tively 
ence. 
s not 
: sub- 
.tions 
itions 
uoted 
ysica! 
under 


riance 


et the 








W. M. Lona 101 


= ™m ‘ 
Letd = > d,/m. Then on the usual assumptions of normality the d,’s are normally distributed 
j=1 


with variances respectively o? say, and we may therefore perform an approximate test for 
differences in the i-classification by setting 


d 


ie (3 (d,—d)?/m(m—1 ) 





and assuming that ¢’ is distributed as Student’s ¢. 

This is equivalent to a test of the i-classification against the interaction, and the same result 
would have been reached by using the F-test in a simple analysis of variance based on the 
cell means and ignoring the complication of unequal cell frequencies. A little thought will 
show that the approximation involved should be a close one when a fair-sized interaction 
exists (since then a variance of the type of o2 above, fairly large in comparison with the 
within-cell variance, would be present), or, if the interaction is not very pronounced, when 
the cell frequencies are not too different. 


REFERENCES 


Davin, F. N. & KEenpDaAtt, M. G. (1949). Biometrika, 36, 431. 
Kenpatt, M. G. (1943). The Advanced Theory of Statistics, 1, 239. 








[ 102 ] 


SOME TESTS FOR RANDOMNESS IN PLANT POPULATIONS 
By MARJORIE THOMAS 


1. INTRODUCTION 


This paper is concerned with a discussion of three tests designed to detect departure from 
randomness in the distribution of a plant species over an area. Material to which the tests 
may suitably be applied is obtained by quadrat sampling, as explained in a previous paper 
(Thomas, 1949). It is assumed that if the plants are randomly distributed, the resulting 
experimental data may be adequately represented by a Poisson series, and the tests therefore 
are designed to detect departure from this ideal state in the material collected. One of the 
main requirements of a test for such a purpose is that it shall be as sensitive as possible, and 
so the discussion will be largely concerned with the power of the tests under conditions 
departing from the hypothesis of randomness in a way to be specified. 


2. STEVENS’S TEST 


We begin by considering a test put forward by Stevens (1937) which is concerned with the 
number of quadrats which contain no plants. If there be N quadrats altogether, enclosing 
a total of S plants, and if Z of these quadrats be empty, the probability distribution of Z, and 
hence its mean and standard deviation, can be found on the hypothesis that the plants are 
randomly distributed. This will be a conditional probability distribution, assuming constant 
values for S and N; the mean and standard deviation of Z will therefore be functions of these 
two quantities. For any given sampling data, the statistic 


t= Observed value of Z — Expected value of Z 
mn Standard deviation of Z 


may be calculated and referred to the normal scale, which, Stevens has stated, approxi- 
mately graduates the distribution of Z for large samples. For small samples the exact 
sampling distribution of Z can be used. 

Since p(Z|S,N) = p(Z, S| N)/p(S|N), let us consider first the joint probability dis- 
tribution of Z and S. The probability of having S plants distributed among N quadrats in 
such a way that Z quadrats contain no plants is the sum, over all possible partitions of S into 
N —Z parts, of the probabilities that there shall be y,, 72, ...,Yy—z Plants respectively in 


N-—Z quadrats, and the remaining Z quadrats shall be empty. The y’s are partitions of 
S, so that N-Z 


x~%=S, and y,>0 for «+=1,2,...,(N—Z). 
i=1 





Expressed mathematically, this statement is 


NI 
27ND) ° dl fe @) 


where PF, is the probability of observing k plants in any given quadrat, and > denotes 


~(Z,8|N) = Pe 1 


Y 
summation over all sets of y’s fulfilling the required conditions. We shall assume that the 
distribution for A, can be represented by the Double Poisson law 

k g-Mmr eta (rA)F- 


=X r! (k—r)! 





for k>0O and P,=e™, (2) 





from 
tests 
paper 
Iting 
efore 


f the 
> and 
tions 


h the 
osing 
,, and 
is are 
stant 
these 


ITOXi- 
exact 


y dis- 
ats in 
S into 
ely in 
ms of 


(1) 
snotes 
at the 


(2) 





MARJORIE THOMAS 103 


where m > 0 and A>0 are two parameters.* For this distribution, mean k is M = m(1 +A). 
This series is developed from the assumptions that the plants are distributed in clusters, 
the number of clusters per quadrat following a Poisson distribution with mean m, and the 
number of plants per cluster, additional to the first, following a Poisson distribution with 
mean A. If A = 0, equation (2) reduces to the Poisson law, and the distribution of plants is 
random without clustering. Thus formaily the admissible hypotheses are specified by (2), and 
we wish to test the composite hypothesis H, that A = 0, m being unspecified. We are, therefore, 
concentrating attention upon possible departures from randomness being only towards 
a clustering tendency, and leaving aside consideration of a too regular distribution. 
We may now re-write equation (1) and have 





a N! eS ae | US eta er — Ai" 
2.51") = Fe FAL ta m 


where under the null hypothesis Hj, A = 0. 

To find p(S | N) we sum p(Z, S| NV) over all possible values of Z. The values which Z may 
take differ according as S< N or S>N. In the first case, Z may run between N —S and 
N-—1, while in the second case the wider range from zero to N —1 is possible. We have 
therefore n-1 
p,(S | N) ~ P(4,8|N) for S<N, 


N-1 
p(S|N)= > p(Z,8|N) for S2N, 
Z=0 


where suffixes attached to the p’s serve to distinguish between the two cases. So we obtain 


finally 
p(Z, S| N) 
Z\|8S,N) =——>~ 1 £«for S<N, 
m2 18,8) =" S18) ms 
_ A(Z, S| N) 
p(Z|8,N) =" ay) for SPN, 


as the distribution of quadrats containing no plants when S and N are known. Equations 
(3) and (4) have been expressed in a manner which does not lend itself to easy algebraic 
manipulation. It has not so far been found possible to overcome this difficulty, except in the 
case A = 0. We proceed to consider the behaviour of p(Z, S| N) when this is so. It will be 
convenient to quote here one or two of Stevens’s results, which will be found useful a little 
later on. The notation has been altered slightly from that of the original paper, to suit present 
purposes, 

If A-20S be the (N — Z)th leading difference of the series whose successive terms are Sth 
powers of the natural numbers, beginning with zero, Stevens shows that 


a —, 5 
ws ~20 (5) 

d yee AN-ZOS — NS 6 
” Za Z!(N—Z)! bie (9) 


where the y’s, as above, are partitions of S into N — Z parts. We note that if S < N, some terms 
of the series in equation (6) are zero, since AY-208 = 0 for N—Z><S. Stevens further finds 


* The simplest way of fitting this series to observed data is by use of the recurrence relation 
k mte-A At-1 


sag k (t-1)1 ** 











104 Some tests for randomness in plant populations 


the probability distribution of empty quadrats when a total of S plants is distributed at 


random among N quadrats to be 
OB BORE? pa 


From this distribution we can find the mean and variance of Z, namely, 
1)s 
yi(2) = (1-5) » 


_(Z) = y{ w- 1)(1 -5) +(1-3) -v(1-%) |- 


If we now start from a different point, and consider the observed data as a sample drawn 
from a population which follows a Poisson distribution with mean M, it is easy to show that 
we arrive at the result given by equation (7), whatever value M may have. For if, in equation 
(3), we put A = 0, m = M, we have 


(8) 




















N! N-Ze-M Myvi 
OTL. 
»(Z,8|N) = Foie a xi 
N! NM 
__“"*__»-NM Ys 
= AOE 5 
Pa. -vu ys 4’ 708 : 
=F (v—al° M' oT by equation (5). 
Ni =6Ay-*0* 
= e-NM 
Hence p(S | N) = e-N™MS Zz ZN! Si 
e-NM YS 
a, NS by equation (6). 
a6, Fae ae in th ishi I 
<tV, PF Zi(N—Z)! SI = 0, every term in the summation vanishing separately, 
so that we finally have 
Z,S|N N! Bh-"0" 
p(Z| 8,N) = 22-51%) 


p(S|N)  ZI(N—Z)i NS’ 
a result which agrees with that of Stevens (equation (7)), and which is also seen to be 
independent of M. 

Taking now the Double Poisson series, we may consider the power of Stevens’s test with 
respect to alternative distributions defined by parameters A and M = m(1+A). If Z, be 
taken as the upper significance level for Z, i.e. is chosen so that P{Z>Z,|A = 0, N,S}<a 
(an expression independent of the unknown M), where a has some convenient value, then 
the power function of the test is given by 
N-1 
N-1 5, S81) 

PiZ>Z = Z N,8S =o y ? 
{ 2 a” | , p(S | N) 
since p(S| NV) is independent of Z. This is a conditional power function within the set of 
samples with fixed S. 

The calculation of the power function is a somewhat laborious task for large values of 

Sand N.To make comparison with other tests, it has however been evaluated in the two cases 


(i) S=5, N=25; (ii) S=10, N=25; 














ted at 


(7) 


(8) 


drawn 
w that 
uation 


rately, 


. to be 


st with 
f Z, be 
S}<a 
e, then 


e set of 


lues of 
yO cases 














MARJORIE THOMAS 105 





taking the level of the probability of rejection, given Hy, at about a = 0-03 in each case, 
M being chosen so that the population and sample means were equal, that is, M = 0-2 for 
(i) and M = 0-4 for (ii). It is recognized that this is an artificial situation, and that it would 
be desirable to obtain some knowledge of the behaviour of the power function when thesample 
mean differs from its expected value. In particular, we would like to consider the ‘overall’ 























power function, i.e. > p(S| N).Pfrejection of H, | 8, N}. 
Ss 
- 
10}- 
10} 
© © 
os} N=25 © 
© 
06e M=02,S=5 
M=0-4, S=10 
o4- © Estimate of ‘overall’ power. M=0-4 
02- © 
g 1 ! L ” (nie © \ hcl 
0-0 02 04 06 08 10 12 14 
Fig. 1. Power of Stevens’s test. 
Table 1. The power of Stevens’s test 
Exact ‘conditional’ power 
Estimate of 
‘overall’ power 
M=0-2 M =0°4 M=0-4 
0-0 0-036 0-033 0-030 
0-2 0-261 0-251 0-180 
0-4 0-515 0-543 0-480 
0-6 0-719 0-779 0-680 
0-8 0-850 0-905 0-790 
1-0 0-925 0-965 0-875 
1-2 0-962 0-988 0-900 
1-4 0-983 0-997 0-930 

















Owing to the difficulties presented by discrete distributions, this quantity cannot be handled 
algebraically, and its exact computation would be very laborious, so estimates of its value 
for the case M = 0:4 and verious values of A were found by a sampling experiment. For each 
value of A, 200 samples of 25 were drawn from the distribution of equation (2) by means of 
random sampling numbers (Kendall & Babington Smith, 1939), and Stevens’s test applied 
to each sample. The proportion of samples which led to the rejection of H, was taken as an 
estimate of the overall power of the test. Table 1 shows the results of these calculations, which 
are also presented graphically in Fig. 1. Comparison in the case of M = 0-4 of conditional 





106 Some tests for randomness in plant populations 


and overall powers suggests that the power is at or near its optimum when the sample mean 
has its expected value. Observed'differences for the curves for M = 0-2 and 0-4 suggest also 
that the power varies considerably with M. This is a drawback, as M will in general be 
unknown, and so for any given case we shall have little or no idea of the power of the test 
if the hypothesis of randomness be false, even although the probability of rejecting the 
hypothesis when it is in fact true is independent of M. For samples as small as 25, we see that 
the power is not very great*, but we might expect it to be considerably higher for larger 
samples, say of 100 or 200 quadrats, although the test can never be as powerful as those 
which take into account the whole plant distribution. 


3. INDEX OF DISPERSION 


We will next consider a test for heterogeneity which is based on a comparison of the sample 
mean and variance. If the null hypothesis, that the plants are randomly distributed, is true, 
then the population may be described by a simple Poisson distribution, having its mean and 
variance equal. Alternatively, if there is a tendencv to cluster, and we make use of a Double 
Poisson series to describe observed data, then th. variance will be greater than the mean. 
It seems reasonable, therefore, to take the ratio of sample variance to mean as a measure of 
heterogeneity. A multiple of this ratio, the index of dispersion, will be denoted by the letter 
z, 80 that 
x (%,—z)? 
z= —__—__, 
= 


where ~, is the number of plants contained within the ith quadrat, and 7 = 5 x;. N is the 


N iz 
total number of quadrats. A test suitable for our purposes is simply obtained he choosing 
a number z, and rejecting the hypothesis of random distribution if z, as calculated from the 
observed sample, exceeds this value. We need therefore to know the probability that, in 
chance sampling fluctuations, z will exceed z), both when the plants are distributed at random 
and when this is not the case. That is, we need the probability distribution of z. This has long 
been known to be approximated by a y? distribution with (N — 1) degrees of freedom, when 
the ~’s are independent variables following a Poisson law. Hoel (1943) has given the first 
four moments of z in terms of the cumulants of the parent distribution from which the z’s are 
drawn, and confirmed the earlier result. His procedure has been to determine which, if any, 
of the Pearson type curves, as judged by moments, is applicable as an approximation to the 
distribution of z, and the method is equally suitable when the parent distribution is not of 
Poisson form. Bateman (1950) has applied it to the case of a Neyman’s contagious population. 

The moments of z as used by Hoel are not exact, but approximations derived by the use 
of Fisher’s (1928) k-statistic technique. Following David’s (1949) procedure, the first three 
moments are obtained as follows: 


Pe) ee | 8 Ni +yile ~ ew 
15x} ay 
“7 a CS a ns 
* The sampling experiment referred to above has since been extended and it is hoped later to 


provide a direct comparison of the power of Stevens’s test and of that based on the index of dispersion, 
for the case N = 25. 























mean 
t also 
‘al be 
e test 
g the 
> that 
larger 
those 


mple 
true, 
nand 
ouble 
nean. 
ire of 
letter 


is the 


osing 
n the 
at, in 
1dom 
s long 
when 
first 
’s are 
any, 
0 the 
ot of 
ition. 
e use 
three 


ter to 
rsion, 











MarJorIE THOMAS 107 


ic sa KR 23K, | 1 One. 1 6x3 “tt! 
Hal2) = (ye a ty ar WTI ROK, 
1 (8x2 18K3K} | Tkyk, | 5K5 sa 
+ «A ihe ek ah” 









































~ 56xK,K3 12K,k, 1 | 
Py +-—33 
+ NH atm TR 
+ 1 {69x} _ baile . 64«,«3 r 84K5K, 18K;Ky 26KyKs + 3K 
Wg @ * a t+ A WO RR 
“a (N—1)8[ 1 (6x3 19K3K}  DKyKZ  L2KZK, BKK, BKgKks pte } 
Bs N? A Mele Ky Ky , 
1 12x} 24K,%3 \ 1 
eh eS ip 12 8x2! +. {8x3 — 4x2 
+NW-1) x + 12kyky+ i) +e 3 — 4x2} 
Pa. F L16Kg 399K 3K3 | 168K ,K3 | 336x543 BIK;Kz 177K yK3Ke 
N®\ Ki Ki Ki Ki Ki Ki 
44K3 | L5kgKk, | 30K5Ks 15Kj 3K; 
Cs aS Os as MS 





1 ere 366x,K}  138K,K}  204K3K, 36K;K, 84K,4K3| 





‘yqN-)| 4 ~ ae Ar os 
; 1 48K} 24K5 Ke " 24K,K, 72K,K3 
N(N —1)?\ x? K? Ky Ki ‘ 


Terms within the square brackets of order higher than the third in 1/N have been omitted. 
The «’s are cumulants of the parent population. We shall confine ourselves to consideration 
of the distribution of z when the parent population is represented by a Double Poisson series, 
having the following cumulants: 

K, = m(1+A), 

Ky = m(1+3A+A?), 

Ks = m(1+ 7A + 6A? +3), 

Kg = mM(1+15A + 25A? + 10A3 + A4), 

Ks = m(1+31A + 90A? + 65A8 + 15A4 + A5), 

Kg = m(1 + 63A + 301A? + 350A3 + 140A4 + 2145 +A), 

Ky = m(1+127A + 966A2 + 17013 + 1050A4 + 266A5 + 28A8 + 7). 


(In the special case A = 0, when the series reduces to a simple Poisson, it will be noted that 
these expressions all become equal to m.) 

Numerical values of }(z), “2(z) and ,(z) have been computed for N = 100 and N = 200, 
and M = m(1+A) =1, 2 and 3 for values of A ranging from 0-0 to 0-6. Distributions 
suggesting themselves as suitable to represent that of z are Pearson’s Type III and the 
log-normal distribution, fitted in each case by taking a zero start and equating the first two 
moments to those of z obtained as above. The ratio 4,(z)/(third moment of fitted curve) was 








108 Some tests for randomness in plant populations 


calculated for some representative values of M, N and A, the results indicating that the 
log-normal was the better choice in this instance, and it was therefore used to obtain the 
powers shown in Table 2. Powers were also calculated from the fitted ‘Type III curves, and 
differed very little from those in Table 2, a fact which leads us to hope that the latter are an 


Table 2. Power of the index of dispersion test 















































N= 100 N = 200 
a 
M=1 M=2 M=3 M=1 M=2 M=3 
0-0 0-050 0-050 0-050 0-050 0-050 0-050 
0-1 0-345 0-343 0-342 0-535 0-537 0-539 
0-2 0-691 0-702 0-705 0-910 0-920 0-923 
0-3 0-887 0-900 0-905 0-990 0-993 0-994 
0-4 0-965 0-973 0-975 0-999 — —_ 
0-6 0-990 0-993 0-994 — — — 
0-6 0-997 _ a — — — 
log, 2 4-817 4-817 4-817 6-452 5-453 5-453 
+ 
10 — 
N=200 
ose N =100 
06 
04- 
02 
l | | ] = 
0-0 0-1 02 03 04 05 
x 


Fig. 2. Power of index of dispersion. 


adequate approximation to the true values. We further suggest that, as an alternative to 
referring {logz—@(logz)}/a(logz) to tables of the unit normal variable (the procedure 
leading to Table 2), a simpler procedure will be to refer z to a x? distribution with appropriate 
degrees of freedom. For N greater than 100, the power of the two tests will not differ 
appreciably. It will be seen that variation in the overall density of the plants, that is, 
variation of M, has little effect on either the power of the test or the appropriate significance 
level for the cases considered. Sample size influences the power to some extent, as might 
have been expected. A graphical comparison is made in Fig. 2, where the values are drawn 
for M = 2. The powers are too close together for all the curves to be drawn, and those for 
M = 2 were selected as being the central ones of those calculated. 








So Ss tee 





ve to 
edure 
riate 
differ 
at is, 
cance 
night 
rawn 
se for 








MARJORIE THOMAS 109 


4, TEST BASED ON MAXIMUM LIKELIHOOD ESTIMATE OF A 


The last of the tests for randomness which will be considered here is based on a property of 
maximum likelihood estimates, namely, that such estimates will be consistent and have 
a sampling distribution which tends to normality for large samples. If / is the maximum 
likelihood estimate of A based on the observed frequencies of the zero and unit classes, 
M, and 7,, it was shown in a previous paper (Thomas, 1949) that 


t= —log,| - "te |. 
$+] nolog, (n/N) 
Further, in the large sample limit, the standard error of / is given by 


m —e-™ eA 4+ e-4 (1 —m)? 
of = 


Nm? e-™ e- ¥ 
which, when A = 0, assumes the value 
_1-M+M*-e™ 








ov = — te 
Hence the power of the test of the hypothesis that A = 0, with respect to alternatives A >0 
is given by l ' 
P{lz1,| A} = e~ 12" dz, 


(27) (la —A)/o4 


where a = e-'=* dz is the significance level chosen. Values of this power function 


v( 2m) la | a; 
are shown below in Table 3 and represented graphically in Fig. 3. A 5 % significance level has 


Table 3. Power of test based on maximum likelihood estimate of A 





















































N = 100 N = 200 N = 300 
A 
M=1 M=2 M=3 M=1 M=2 M=3 M=1 M=2 M=3 
0-0 0-050 0-050 0-050 0-050 0-050 0-050 0-050 0-050 0-050 
0-1 0-197 0-099 0-058 0-292 0-137 0-074 0-377 0-172 0-087 
0-2 0-456 0-197 0-080 0-682 0-324 0-126 0-821 0-439 0-172 
0-3 0-711 0-352 0-122 0-916 0-588 0-227 0-978 0-754 0-334 
0-4 0-872 0-542 0-193 0-985 0-818 0-387 0-998 0-937 0-563 
0-5 0-948 0-720 0-297 0-998 0-942 0-585 ae 0-990 0-784 
0-6 0-979 0-850 0-431 — 0-986 0-769 — — 0-924 
0-7 0-991 0-927 0-576 oo 0-997 0-895 = oo 0-981 
0-8 ms 0-966 0-710 — -= 0-961 — — 0-996 
1-0 — 0-993 0-891 — — 0-996 — — — 
1-2 — — 0-965 — —- —_ — a os 
1-4 — — 0-989 a — — — — — 
Table 4. Values of loo; 

M N= 100 N = 200 N = 300 

1 0-216 0-152 0-124 

2 0-378 0-268 0-218 

3 0-648 0-458 0-374 





























110 Some tests for randomness in plant populations 


been taken for all cases. Values of /,.5, are shown in Table 4. It will be noticed that the power 
of the test increases with sample size, which is to be expected. A less desirable variation is 
that which is observed between different values of plant density for a constant size sample. 


of 


b— 



































Fig. 3. Power of A-test. 


In practice the value of M is often unknown and must be estimated, and here it is necessary 
to add a word of caution. If estimates My, l of M and A be made from the same sample, these 
estimates have a strong positive correlation. If H, be true, it is 


ne M?—2M +2—2e-™ 
P= (i? — 3M + 4—4e-™) (M?—-M+1-e-™)} 





which, for example, is 0-846 when M = 2. Thus if an estimate M of M be used to calculate 
a value of l,, the probability that /, when calculated from the same sample, will exceed 1, will 
very infrequently be even approximately equal to «. For various values of M and N, 
and a = 0-05, some values of this probability were worked out, taking M in each case 
as differing from its population value by one standard deviation. Results ranged from 
0-079 (M = 1,M = M+o%,N = 300) to a quantity negligible to three decimal places 
(M = 3, = 100, 200, 300). 

Alternatively, if M and / are calculated from different samples, and are therefore inde- 
pendent, an error in M of one standard deviation in either direction has less effect. From 
calculations made, it would appear that for M small (<1) and WN large (> 200), an error of 
this magnitude in M will not affect the significance level when a = 0-05 by more than 0-01. 

Since so little information is needed for the application of this test, we can hardly expect 
it to be as powerful as others which use the whole range of the sample distribution. In the 
absence of more complete knowledge, the maximum likelihood method provides a quick, if 
rough, estimate of the values of M and A, but when the true value of M is unknown, the 
suggested test based on these estimates must be used with caution. 











Ow ee eo ot ot ot Ol 


—_ Oo as Ae 


power 
tion is 
mple. 


sSary 
these 


ulate 
L, will 
id N, 
| case 
from 
olaces 


inde- 
From 
ror of 
0-01. 
xpect 
n the 
ick, if 
1, the 








MARJORIE THOMAS lll 


5. COMPARISON OF THE TESTS 


The third of the three tests described, namely, that based on the maximum likelihood 
estimate of A, when only the values 79, n, and N are known, is strictly not comparable with 
the first two, since it supposes a knowledge of M, the population mean, which they do not. 
Further, itis only applicable to large samples. Stevens’s test and the index of dispersion avoid 
the first, and Stevens’s test the second of these difficulties, but the theoretical treatment of 
both is awkward, in the one case for large, and in the other for small samples. As far as the 
present investigation has gone, it appears that the index of dispersion is the statistic to be 
preferred in testing for randomness in plant populations. The fact that, for large samples, the 
power of the test varies so little with M is a useful point; in addition, z is easily calculated 
even without the aid of tables and means of mechanical calculation. 


6. SUMMARY 


Three tests are described which may be employed to detect departure from randomness in 
certain plant distributions. The power curves of t.ese tests are plotted under certain specified 
conditions of departure from the null hypothesis. Some attempt at assessing the relative 
merits of the tests is made. 


I wish to express my grateful thanks to Prof. E.S. Pearson and Dr F. N. David for their help 
in the preparation of this paper. 
REFERENCES 


Bateman, G. (1950). Biometrika, 37, 59. 

Davin, F. N. (1949). Biometrika, 36, 383. 

FisHER, R. A. (1928). Proc. Lond. Math. Soc. 30, 199. 

Host, P. G. (1943). Ann. Math. Statist. 14, 155. 

KENDALL, M. G. & Basincton Smits, B. (1939). Tracts for Computers, no. 24. 
Stevens, W. L. (1937). Ann. Eugen., Lond., 8, 57. 

Tuomas, MaRJorieE (1949). Biometrika, 36, 18. 











[ 112 ] 


CHARTS OF THE POWER FUNCTION FOR ANALYSIS OF VARIANCE 
TESTS, DERIVED FROM THE NON-CENTRAL F-DISTRIBUTION 


By E. 8. PEARSON anp H. 0. HARTLEY 


1, THE DISTRIBUTION OF NON-CENTRAL x? 


If uw; (¢ = 1, 2,...,v) form a set of v random variables each distributed normally and indepen- 
dently with unit variance about a mean zero, and if a, (i = 1, 2,...,v) are v fixed constants, 
then 


x2 = 3 (uta)? (1) 
i=l 


is known to have the following probability density function: 


e—3x? e-tA © (x’2)b+3-1AI 





12 
where A= > a’, (3) 
i=1 


The function (2) has been termed the distribution of non-central y? having v degrees of free- 
dom and A has been termed the non-centrality parameter. The distribution (2) was first 
obtained by Fisher (1928); in his notation B* = y’? and f? = A. Fisher gave a table of the 
upper 5 % values of B for v = 1 (1) 7, 8 = 0 (0-2) 5-0. Subsequently in a Ph.D. Thesis for the 
University of London (unpublished, 1934), F. Garwood provided a table of the lower 5 % 
points of B. 

If the a, = 0, x’? becomes an ordinary central y?. In certain problems it is useful to know, 
for a given A, the probability that y’? will exceed a particular percentage point, say the 5 %/ 
point, of the ordinary x? distribution. This probability expressed as a function of A gives the 


power function of the y* test. Appropriate tables have been computed by Patnaik (1949) 
and Fix (1949). 


2. THE DISTRIBUTION OF NON-CENTRAL F 


If x;? is a value of non-central x? having v, degrees of freedom and x3 a central x? with v, 
degrees of freedom independent of y;?, then we may write 


, v z (u,+a,)* 
F = at (a bc lnteied (4) 
V Vit 
‘ yy > wy 
i=y,4+1 
where all the wu; (i = 1,2, ...,v,+,) are independent normal deviates having unit variance 
and zero mean. F’ may be termed a non-central variance ratio. The probability density 
distribution of F’ is known to be 


; rs) —ta Ay yp 4,47 yp —iv,+¥))-3 
F = ¥ . e (4 : nat Fwy “1 
a) i fF ! B(4v, +3, $2) (7: (: Ve ) (5) 











—= f4 = = 


ee i -: i. ae oe 


NCE 
IN 


epen- 
pants, 


(1) 


(2) 


(3) 


f free- 
3 first 
of the 
or the 


r5% 


cnow, 
e5% 
es the 
1949) 


ith v, 


(4) 


riance 
ensity 


(5) 





E. 8. Pearson anp H. O. HARTLEY 113 
where A = >) a? as before. The distribution is immediately derivable from the form given by 
i 
Fisher (1928) (his distribution C), who took as variable 


BR? = x1"/(xi? + X3)- 


Any general tabulation of the probability integral of (5) would involve extensive computa- 
tion as three parameters ,, v, and A are involved. 

The application of the F’-distribution with which we shall be concerned here is in con- 
nexion with the analysis of variance. It is clear that if in a comparison of two variance esti- 
mates the underlying variables in the ‘treatment’ sum of squares are non-central, then the 
chance of establishing significance for a given non-centrality can be determined from the 
probability integral of p(F’). Thus 


[-puryar’ = BA |a, 09) (6) 


where F, is the 100a % significance level of the ordinary variance ratio test, gives the pro- 
bability of establishing significance at the 100 °% level for a given value of A. In other words 
B(A| «, ¥4, V2), regarded as a function of A, is the power function of the analysis of variance 
test. This use of the distribution of F’ has been discussed by Tang (1938), who provided 
certain tables, and by Patnaik (1949). Also Lehmer (1944) published additional tables and 
formulae for the asymptotic distribution of F’ for large v, and/or v,. 

In the present paper we have provided charts which, we believe, are easier to handle than 
these tables. Just as in Tang’s tables on which they are largely based, the charts are confined 
to variance ratio F-tests carried out at the 5 and 1 % levels of significance, i.e. the error of 
the first kind, a, is confined to values of 0-05 and 0-01. We have also retained Tang’s non- 
centrality parameter - i 
b= Alt} =[E ato}, (7) 
in place of A. a 

Eight charts are provided corresponding to the eight values vy, = 1(1)8. Each chart 
gives two families of power curves corresponding to a = 0-05 and 0-01. In each family an 
individual power curve corresponds to a specified value of v, and gives the power f( |«, V,, V2) 
as ordinate plotted against an abscissa scale of ¢. The curves are drawn for 


vp = 6(1) 10, 12, 15, 20, 30, 60 and oo. 


It should be noted that the £-scale is logarithmic, thereby expanding the scale in the im- 
portant region of high power 0-80 < £ < 0-99, whilst at the same time straightening the curves 
and facilitating interpolation. The f-grid and the harmonic y,-grid are sufficiently fine to 
permit interpolation by sight. The ¢-grid is equidistant and the vertical rules correspond 
to steps of 0-2 (or 0-1) in ¢. 

As we shall see below, in many applications the charts are used to obtain the power, /, 
corresponding to given values of a, v,, v, and ¢, but often fis specified, and it is required to find 
v, or ¢. For such problems where inverse and multivariate interpolation is required, the 
present graphical form appeared to us to have advantages over tabulation, particularly as 
high accuracy is unnecessary. In what follows we are concerned with: 

(a) A description of the method to be adopted in determining the form for ¢, given the 
experimental set-up and the effect whose significance is under test. 

(6) Illustrations of the use of the charts. 

Biometrika 38 8 











114 Charts of the power function for analysis of variance tests 


3. APPLICATION TO THE ANALYSIS OF VARIANCE 


To bring the distribution (5) into play, the mathematical structure which we must regard as 
appropriate to the experimental data is one in which the random variation is due to a 
normally distributed error element, superimposed upon a number of additive constant terms. 
In the most general form, x; is defined by 
a%=&,+2, (¢=1,2,...,N), (8) 
where 
(i) bs = C439, +6204... +6;,9, (= i ety | (9) 


(ii) the z; are N independent normal deviates each having expectation zero and a 
common variance o?. 


The 6’s are s parameters whose values are generally unknown, while the c’s are known con- 
stants, usually equalling 1 or 0. The null hypothesis specifies the values of, say, r of the 0’s, 
e.g. 0; = 6 (j = 1,2,...,r). The power function of the test measures the probability of 
establishing significance when 6;—09+0 (j = 1,2,...,r), the non-centrality parameter ¢ 
being a function of these differences and of o,. 

It is well known that this mathematical model cannot always be regarded as appropriate; 
for example, the differences 0;— 69, if they are not zero, may be more reasonably regarded 
as random variables rather than constants. Again, in many applications of analysis of 
variance the residuals z; cannot be assumed independent. We think, however, that the model 
implied by equations (8) and (9) is applicable to a considerable number of problems and 
hope that by making the present charts available more thought may be focused on the 
mathematical assumptions underlying methods of practical analysis in common use.* 

Starting with this model, Kolodziejezyk (1935) showed how it was possible to determine 
in a systematic way the two appropriate sums of squares to compare in a test of any hypo- 
thesis concerning the values of the @’s. He described the general procedure as the test of 
a linear hypothesis. In deriving the power function of the test, Tang (1938) also started from 
this basis. It is usually unnecessary, however, to determine from first principles which are 
the appropriate sums of squares to compare, as they are known from the well-recognized 
character of the experimental arrangement. We shall therefore suppose that the two sums 
of squares appropriate for a given comparison are known; they may be termed the ‘treat- 
ment’ sum of squares S, based on v, degrees of freedom and the ‘error’ sum of squares S, 
based on v, degrees of freedom. All that is necessary for use of the charts is to determine the 
¢ of equation (7). As Tang has shown, the procedure is simple: 

(a) We must define the assumed algebraic set-up of «quations (8) and (9) and note which 
are the parameters 0,, 0, ...6,, whose values are specified by the null hypothesis. 

(6) S, will be expressible as a sum of squares of linear functions of the x,. Ao? is then the 


value of 8, obtained by substituting into this sum of squares z ¢,;9, in place of x; 
j=1 


(¢ = 1,2,...,). Since the form of S, is such that the remaining 0, (j = r+1,r+2,...,8) 

cancel out on substitution for z,, the procedure is equivalent to making the z, in equation 

(8) zero and substituting , for 2;. 
(c) Finally we have 


P = Vf{Al(v, + 1}. 


* The distinction between alternative mathematical constructions underlying the analysis of variance 
has been discussed by several writers, e.g. Daniels (1939), Eisenhart (1947) and Johnson (1948). 








Zz 


7 


—_— Le tere, toe” Ge eee) eee. ae 


115 


E. 8. Pearson anv H. O. Hartiey 


$-L <—— ($0-0=” 405)@ 


(10-0=2 405) 6 ————> 








OL-0 





07:0 





0£-0 








0v-0 


\ 








0s‘0 

















09-0 














04:0 














08-0 


06:0 
t-0 


¥6-0 
$60 
96:0 


46-0 


86:0 


66:0 























NAAN 
NINNNAS 
































“ 











S SONGie 
NAIANSERAN, 











































































































76-0 


$6-0 


£60 


66-0 








erms. 
(8) 
which 
nm the 


(g) ajnsa4 auerysu3}s yo souey> 





S > € z L <—— (10-0=” 10))@ 























































































































































































































(s0-0=0410)) @ ———> ¢€ z L 

01-0 }—— 4 01-0 

07-0 wt 07:0 

0£-0 BZA 0€-0 
~” 050 ale ZO LA LED a 0S-0 
8 FS SALLY ZAZZ LALA 

dine ZEA EZ, ALLL. oe 
= MO ALLLLVLLL, LLMLLLL. 
S 19° |Z 816 OLlet St//oe Yo VALAWHIAN, 

02:0 Ce a ae ZLICLINLLL. 04:0 
> el ¥ 4 ik RP it ee LLAALL LLL Ad 

7 2 DP + > ai oA ea ALLIAL LAL, * 

© A ZVZZIZZIZZVZ ZZ L 
S 08-0 7 eS os ME YO OS OE MS LALLALZZ 08-0 
=> Ya! OAS A A A A LALLLALLL A 9 
$s biutit tit pf pd dp LLM LLALLL 3 
8 Pi aE a Fi PAV AAT: » LLLLL LL ALL 8 
5 AVRINLZALALY YATEMIAA 2 
. LIZ VALI LVI) JEL eee 2» 3 
s oF Z a Nw YO Z ATMA f “> 
S uo ZL 4 Ate VIVAL S ae 
§ "7 BA aes LI Si as ALALLIVLLEL = 
- ae BT ee ee LV/IN/IV/ “ue © 
ES 2 Z£ 3 Pad dd a st Oc Ot 09 fo L TM L, <= 
S %-0 yA 8 SF? A 8 A 1 a Ua a “= 

eg fs ed ZITA ain 
s Fs Na . (AN Oe ew /- a hy ET Kf 
aca Fa EEE AAA a 
2 Z Fe ee eee Ty TOA oa 

Z ok eH 77 TV TV TT aT 
& at y ape «mr 9 om © al ms ss ima A oe ad 
£.. y Ma Ey 48 OF a8" 4 ia ws A JI//I LLY SAL 
LZ MEE MAS OS OY AS a a. ALV/A YL] 
V Vy, i FAR Sw: rate 4 YL Lf [Th z="0 

oa “, a Fe ASIII ALI LY LALA LET a 
© 9 Z 8 6 OF UT Sb O% OF 09 @ 9 £ 8601 SL Of wm —%y 
— TL 0% «(09 





117 


E. S. Pearson AND H. O. Hartriey 


0L-0 
07:0 
0£-0 
Or-0 


0S-0 F 


09-0 


02-0 


08-0 


06:0 
76-0 


¥6-0 
$60 
96-0 


46-0 


86:0 


66-0 


(S0-0=%40)) ————» ¢ 


| <—— (10-.0=740)) > 




















































































































































































































] 
- Ae 
iat 
Ae APS Pai 
9 | £18 GOL ZH St 07 
ui aE 2 BN era 
e ee ea © 4 
ii i aL a Fas aaa 
a i RE 3 OS oe 
Ae SF AE I Re 0 ARE ZLVZALLLILL 
Fi ERS Pa a a i LALILMVL IV LL 
5 A eR AE EE A A ERs LSLILLALL LY / 
Fe a WR a LVL LALLA 
ZIZVZZLVALAT TILLY. 
Ae Ge TA Ms 3 _£ B/oy s\/ot/@ 
i Pk MRE ea SRS aT a SS L a ZL LIVLALL 
A BRE A, 8 cae Ta an TI7 7 a | a EB ERS 8 it 
Z Pia L L$ V ASL LA AS 
| ZV ALIA ZVIM ATT 
S| |e | 8 |s ob ce st loc ot 09 |= LV IVA 
Fa aT OAPs ES Ae es vARw sé OE a Bae 
Z ] | ite BF Tt] LVI TALIYT 
Arar awl [ JIL ZV JALAL 
a aa A Sa a VME PRU IS BH, 
L a L L z. Se 2 a a L a a YRS (i wb Sa 
L L vi pS es oS J a a L 
£: rs L Wf L Y 8 Yh a 8 EP AY Oe BB | 
¢ | a en i a a MN A 71777 ah aw 
F Oa OR £F ANBE B 8 Fe] ie F&O BW ERS ae ar) eS 
L Fa L Lif Pe es aa a 2 YE sO as 
/ as 9 ASR Hee a (a lll BES ASF: 
| ile: A 8 RE BaF [ Fo ee ee 
Fak ES ae BP a AR LT ILALALL c='g_ 
JI/I/ W/ IF WF FY AL | ZY IAL] l} | 
8 OL Zt St Of OF 09 m 9 L 8 6 OLZLSLOZ OE 09m —%y 


08-0 


06-0 


76:0 


¥6-0 
Sé-0 
96-0 


46-0 


| 86-0 





(g) ajnsas aucryiuis yo a2uey> 








z 
(S0-0=0 105) 6 ————>  ¢ 


L <—— (10-0=” 10) 
Z 





£ z L <—— (10-0=940))¢ 

















































































































































































































S b 
(S0-.0=2410)) ———> f£ z L 
010 1 01-0 
- Saaz “ 
S 00 AAES BLL ZZ ZA ov 
ae ADA SATINA/ WY 
PMO EM RM ER TS S ZALLLLLLL 
8 09-0 ee ee ee ee VAAL LLLL 09-0 
= MP Ot I Ae a LYLLMLLILL 
2 9 | 2] (8 J6 Ob \Zb SLO 0€] 09 co VY JVI /V//) 
= 010 7 7 dF 6D 7 = PIL TAZ LIV TT 02:0 
> fs Yr cal iF 2 Yaa AP A FM SE ES OM SEH BY 
i) a Pa a St OP 717 ZZ a ie i! SPE A! AF 
% ee ee 7 VT ATZWNZZALLALSL 
S 08-0 xX ee FIZ ZVZZV ZZ 08-0 
> 7 2 oe? eee 7VINMZAZTIZ 9 
3 ot ee ee ee 2 oe 2 ZIAZTY IL é 
8 Z it FRR MA eS a LILLIAN SLIVIL e 
S fe ee oe ee LV ALIVE 2 
- > HT MF 2 AZV ALAS 9 4] 86 01/st /ot /o 3 
5S 0 Z y a an Tits tos 60 & 
S uo YAY AY A A AY A A mie 
§ 2 a a a a ee ® Dee Cheha t Le ae 3 
a Z 7A A A A AY A a 
TTA fe ele i oe ot se TIT V PATA 4 
3 560 7 2 ON Se im ya Ae a 7TH TV TT 560 ~ 
96-0 ZL + Fe oh ae a L LVS FLALATL 96-0 
s eae ag OE Of G RY Sl ae L JALAL ALLY 
a ts 7 17 a 
- Le eV 7 SV | A 
5 ? ys VA + +- L L v4 4 4 + L L 7 + + # + t t vA + 
Pate EE ik a as ee ee 2 
S 860 ra ae a a | a wr x mae Att tot ott 86-0 
L L L y oS Ae | is | an, Bill UB a SE ES 
iL VA OF Sw os a a a a, a a a A: 
/ / Vb Wal See 0 ae a a a a A p='A 
er L VA ‘a aR 3 PT oir 7 As AT as ae 
o 9 Z g 6 OL @ St OO OF 08 9 %L£ 8B 6Ob ZSLOZTOEX9™ =|] 





119 


E. S. PEarson ann H. O. Hartriry 


01-0 
07-0 
0€-0 
0v-0 
05-0 


09-0 


02-0 


08-0 


66-0 
76-0 


¥6-0 
S6:0 
96:0 


46:0 


86-0 


66:0 


1<—— (10.0= 2 10)) @ 




















































































































































































































¢ € z 
(s0-0=0 405) ———> f£ z L 
-] 
LA AE EA 
Shkakaa5icds 5 AZ 
LAMM EEL 
ees gO OE of Ae LYXLLLLLLALL. 
DA BF Oa oh CO A WO ZALLLALL 
ZIZVUVALELAL LALLILLL 
-9f 2 8] 6 Ob Z Sb 02 OF 09)m VEAV/A GOA WAA 
oe = a rs w 4 ae 2 OF Fa Me Oe oe a a 
v_ ZL LIZ ; ae os ew 4 ws 4 Pde SMT AT EF ME RB 8 8 
y ff , SD SE se + SP 7] P 4 Va ED mB Me 
Z A ae Ow ae Fs 7I_LIs 7 ZIZZZZZALZL 
re 3 raw BLAS. 2 AZAZZAZIVIL 
Fa Ws 4 rales 0 | 0 MPA A ATG RE. a 4 
Ps = rae ee ZXYSIM/IALIVSL 
Z. vas Va Fy 8 ya VY LV SALL TAL 
Z IAA ZV] ZLALV IAL 
Y viet Asis af 9 £8 601/|st/ ot / @ 
Ps L Va PA Yor FS Oa SD A SO ee 6 = 
e 4 ‘ae oA a a ee ee ee Zi JV/JALAL/SY SL 
/ a ae ae ae 21d Sige eo 
. ym ars ways ‘ asd LW Dd 
A a SS a 
a ZL ; or Sas | ie Ye Vs vi ARS BLS! [ 
ZL ZL a as + a a ae el) a a dar aw a ee 
y yA P44 8 YY f fas De ATA AL ALL 
ras 05 RE OE A A AHH Me Bs a 
P. ix L L vi vi z rf PAH A+ At 
# Ft t t t t t t - t Fé t {0 ae oe ae 
7 7 * ft a oa & Bee: es (er oe au 
7 ‘Sus mw! 71-2 34 a ae: ee a ee LiL] 
V rs am ws rf L Y aa Yj [ a as we I [ 
L ei ae : [ TVW I [ 
[ / = SP ae ws AEA SL 
VA ILI AIA A fy f $='4_ 
/ AWAVAW Aw Iie e ee es eee | 
9 8 6 Ob @ St O€ OF OF @ L 8 6OL Ub SLOt OF 09 P= 





(ceo.9=% 101) @ ———————_> 


t 
¢ 


L <—— (10-0=940))¢ 
7 


8 
o 


(g) aynsaa aueryiudis yo aouey> 


86-0 


66-0 

















¢ € Zz L <——(10-0=2409)) 




















































































































































































































(S0-0=% 40) ————_» ¢ Zz L 
0-0 = — 01-0 
060 LAS AAZAL 0 
FE 0v-0 aa SLLALL, = 0+-0 
g ZIZAZAZAZLAZ FAIL LL 
S 090 Pe ae aN ae a ZLYMLZ LL LLL 09-0 
‘> PAW ae Fa AEM, LLM LLY. 
S a 9 4 {8 6 Zh Ob|SL 0% OF 09 @ MIVA /, we 
> — = am ae oe dn LAH HAH ALLL LAAT ' 
8 Z nS ES Te 7-17 ZWA7AZ TAL HY 7 
1% .. 21. 7 dW ds ZAZUIIWVTH 
2 08.0 Z re ee a ee ae oe X7V777 ALL 08-0 
= ME a ew a 7 5 ae ZIZAZZII IVT 9 
3 A Mae 4 . ZI\/ [ y} +4 LV7/ AJ JIT / 2 
S A RE LY A Jif 5 a a mS LNV/SILILALT 8 
& LZ Fo A ak a: a A al a ALL L LIV LLL 2 
as ZI ALAA Seale a a2 xo 4 
8 Z FA Es a FS a ae A AT A A a A 0-5 
3 wo 2 SR A Fe ie a ATW IV IVT leidalll 
§ UL I yO mn / LIZVJASASALI & . 
460 L L Aig L ae a ae ae L Ys AL fie Bl “0 = 
s 2 £ | 8 | 6 jo jz st o of lo Y si EPR ey <= 
s 56-0 7 sme 5 ae ok ar a TTT TT 60 5 
96-0 Z i Y a Oe es 8 ee me a OSS A ME ie! tie bs si 96-0 
3 Z ya i ha ee CMT O22 we | LI 
. ne Va ‘MASAO ROMS esas me JAZJALIL LI 60 
L L L L ri L L ye i 7 aN : a L me i 
2 Z 7 — = Se oe oo oe ee Fi ee 
3 9 va : 7 ee: Fa eee ae 
4 
S 860 7 7 + a ar a ae YAY SS RE a A a 560 
L ee Ff L [ aU [ TS OP EN Yo 
/ ee we we 2 ee ee ee ve CS 0s eee i 
Ps VA PI w ese! awl JL ASV FI ALY LI] 9='4 
ie ae a ae , oe i | LIFAIFASVUEALY ee 
q Z © 6 O&O tt os oe coo 9  Z£ 8 6 Ob Ub Sb 0% OF 09 wo =%q 





121 


E. S. Pearson anp H. O. Hartiey 


1 ~<~—— (100=%40))¢ 

















































































































































































































02:0 


08-0 


06-0 
76-0 


¥6-0 
S60 
96-0 


46-0 


86-0 





66-0 














(g) 3nsau aueryiuSis yo aouey> 








€ z 
(S0-0=” 10)) ————>  € z L 
oo 
02:0 
rt 4h 
be LYL. 
QLLLLL, 
ie Ps S. re ALLLLALLAL 
—_ 9 __|£ B 6 [Ol ZL St 0% 0/09 @ LALLILLNLLY 
— a an fF a - r t y fF a aa 7 t t 
y. -. eS SM! & Y Gat ¥ 4 VM Be DE OP Yd Yi: 
Ys SE OW 2 i See ZASTMVIALLIVI 
08-0 ws a at gor te ie Lt Y_ZIZZAZ ZILLI 
yr a WT 717 yaw ZVI7V 717 
Y S27 mae waves 2g rears ie, LAL TL 
wf Z LZ ¥-- fi '; a. LXLLA/V LV IL 
ATI AAA TS ALLY LITA 
i Fe SG EWA WATS OW BMPS Seo eal 
717 + a el Z [2 MO eee ee 
e SS a a SY 2 BY Tit ATA ALA Tr 
ae ap ae Gee OP 1 oe eS ee: i. Sip te ee ee 
ate ; Cyto et wt ee AOE SE BE SRM 
Z| s] solu | si joc joc o © ITIAV ALLL 
56-0 * a 7 Ty. T 7 V/V TY H 
96-0 / PSP ae ey Ss By lima ew ae 2 l 
} | [I] (ae: fl a ei 
A a | Jif (res Si SS maizw *: 
_ a + t- fi t f t t a y, fp t F t t + Tr “f 
Fak Ps L L a I vi I L Pa 8 ae MU : ye a 
L L L ‘a L a I [ ii ta PS ea Yes ame vale je i 
86-0 °° at Re aoc GORE 1 er eee ry aT 
19 A | a ] SS OT 8 Sa: ae 
rs y / I Vi L LiL l Ava 
[| / ett ft eR: 2 Fit [a | 
iP tt ee te LI Tara, 
wil [ Ets iia [i big fh Ji Odes oF ith e f 
6 OL te SL 0z og 09 @ 9 Z 8 60L ZL SL OZ OF 09 ~ —%q 
€ z L <——(10-0=2409)) 





+ € z L <—— (100=% 10) 

















































































































































































































(S0-0=% 405) ———> ¢ z L 
OL-0 _— EB OL-0 
- CL AAAEE Z 
S A eee Ti? ae ZZRLLLINLL 
oe ME a OM A a ZLALLLLLL LL 09.0 
= Zl AZVALZYZVZAL AZALI MV 
8 an Pe ZL |8 6/0L ZL St} 0% OF 09 @ SASS LV LAL _ 
‘> le Att EAT 
Ss 
a oA i ya Zz L | a AT Es OR RE | aw 
‘3 Zz + AZ Zr L A a a t ‘a wake 77 tt 
> 80 7 aT rt dt eT ZVIAZ IU TV G0 9 
3 OT et 2 TR Rs ZZ ALVA Z 
8 Z P es rime 4 ae 2 a PS VA VJV/AS A/S /I/ 3 
& Z ad TF 5 a ie Th a aia e 4 Zz  & JV /V/f 2 
- ATTICS: FL 8 wok rh shot 06 a g 
RS 06-0 ee ON a em an Be Ome 7 A VA A AY A 060 5 
8 wo Pee ee ee ee a ee 1_JV71JViVIta ne 
S ae 8 a ae aT eBay. Aaa wae . 
a aan (oe at Me Ws 8) cise TV TTT om | 
A fl 8 6 ob Zbl st] Oc] OF [09 @ TALAI IE LY 1 < 
56-0 7 | i ta A ok A i A a a a a 7 AVA ae 60 = 
“0 2 Sy ae Fe me oo ae 7 A/V Tt oan 
3 7 7 ar ie ae tae 8 7 TITTLE 
> ya Fé wee ) [|_| , ay ee ny 2 ee , 
460 ya 7 7 13 7 t i 7 7 7 7 i 1 = 46-0 
2 f F + F t t t i t t —F- f A tH t t t t t 
= i L L ia [ ey L a L L Feat a i eal Pak 2 
SS so ES a SS TCH or ar ee ay Mott me 
7 7—T7 a il as i a RS Nk TCV tt Th 
9 VA 4 } [ l [ rtf .- Fs ime f L aes Lt 
ts ls ae l [hf i yi a we 23: l 
Fifi f | a y, A/Y¥Y Jl ive g='4 
— TOILET I TAT AT te 
ba Z 8 6 OL Zt St 0z 0€ 09 @ 9 Z 8 6 OL @ SL O02 OF 09 m —%y 





<— 9 (for av=0-05) 


3 
1 


¢ (for a=0-01) ———> 





E. S. Pearson anp H. O. Hartiey 123 


4, THE VALUE OF THE NON-CENTRALITY PARAMETER ¢ FOR SPECIAL CASES 


The procedure can be made clear by applying the rule to particular forms of analysis. For 
reasons which will be at once obvious, we shall replace the generalized 0’s by parameters 
A, B, C, ete. 


(4:1) The one-way classification into k growps with n observations in each 
The set-up assumed may be written 


tqy=A+B+z (¢=1,...,4;1=1,...,n). (10) 
k 

Here we shall take } B, = 0, representing the average level by A. The null hypothesis is that 
t=1 


B=0 (t=1,...,k—-1). (11) 
Note that since 2B, = 0 there are only & independent parameters in (10) and the hypothesis 
specifies values for k—1 of them. For the analysis 
8, = 2D (%.-2)?, 4 =k-1; 8, = DE (%q—%,)*, v,=N-k, 
t tl 


where Z, is the mean for the observations in the th group, Z_ the grand mean and N = kn. 
Since %, = A+B,+%, Z =A+zZz, 
it is seen from rules (5) and (c) in §3 that Ac? = n > B}, and 

t 


1 n 1 G, 
ron SB = [le Pu 
+2 Meal bvalo% 0 
From the second form of (12) it is seen that ¢ is the ratio of the standard deviation of the 
k values of B, to the standard error of the 2,. 


In the special case of two groups (v, = 1, k = 2) we have B, = — B,, so that the non- 
centrality ¢ is directly related to the distance apart, A = 2| B,|, of the two group means, 
B, and B,. We find that ¢ = 4A / i The test is therefore identical with the symmetrical 


form of the non-central t-test as described by Neyman & Tokarska (1936).* Their notation, 
that of the general tables of the non-central t-distribution given by Johnson & Welch (1940) 
and that used by Lord (1950), are related by the following table of equivalents: 





Notation used by 








Present Johnson & 
authens Neyman & Tokarska Welch Lord 
a 2a — a 
v7) 2-ip 2-+8 2-*9 
Ve n . f v 
B 2—P,(a, p)—Pr(a, —p) —_ B’(p)+8"(p) 




















* Note that Neyman & Tokarska’s p = A/a, where o is the standard deviation of the normal deviate 
x used in estimating A. Thus for x = %,,—%,,, 7 = o,(2/n)}. 








124 Charts of the power function for analysis of variance tests 


The a in the last line corresponds to that of Neyman & Tokarska’s notation. Since for all 
points plotted on our chart for v, = 1, P;;(«, —p) = 1 to the plotting accuracy possible, this 
chart can be used also for the asymmetrical non-central t-test, using 2-5 and 0-5 % as signi- 
ficance levels. 

If the number of observations in each group are unequal, then Ao? = ~ n,B?. However, 


since the use of the power function is in connexion with advanced planning of a controlled 
experiment, the n, will generally be taken equal. 


(4:2) The double classification with one observation in each cell 
Set-up: 
fy = A+B +C,4+2, (8=1,...,h;¢=1,...,k). (13) 


Again we shall take 5 B, = 0 = }C;,. If we are testing whether a B-effect exists, the null 
8 t 


hypothesis is that 


For the analysis 


8, =k>Y(%,.-7)?, y=h-1; S.= DY (te4—%.—FZy~+F_)?, ve = (h—1)(k-1). 
8 es ¢ 


We find at once that Z, -%_ = B, +2, —Z,, so that Ao? = k > B? and 
8 


_ 1 /{k a _o, 
$= aE ~ GE?) Ge or 
(4-3) The double classification with n observations in each cell 


In this case it is possible to examine for the presence of interaction and the set-up may be 
written* 


Ly = A+ BL+C0,4 Dy tq (8 = 1,...,4;¢= | BM oe 1, 00598) (15) 
Again we shall take > B, = © C, = 0 and also 
8 t 


ED,=0 (t=1,...,k) and YD,=0 (s=1,...,h). 


If we test for the presence of a main effect, say whether B, = 0(s = 1, 2,...,4—1), we find that 


_ 1 /fkn (a \. 9% 
$= 5 LEE) ~ PE] ey i 
If we test for the presence of interaction, the null hypothesis is that D,, = 0 for all combina- 


tions of s and ¢ where, in view of the relations above, only (4 —1)(k—1) are independent. 
For the analysis, 


(74.—%..—Zy.+%), vy = (h—-1)(k-1); 


(sq — Ta.) Vg = hk(n—1). 


* We are here concerned with systematic main effects B,, C, and interactions D,,. Often it is more 
appropriate to regard one of the main effects (e.g. ‘ Blocks’) and the interaction as random variables, and 
in this case the appropriate test procedure is different. 





Th 


ce for all 
sible, this 
as signi- 


Lowever, 


mntrolled 


(13) 


the null 


k—1). 


(14) 


nd that 
(16) 


mbina- 
ndent. 


is more 
les, and 





E. S. Pearson anp H. O. HartTiEy 125 
It is found that Ao? = nd ¥ D3, so that 


8 ‘ * i 
$= >. lanes ee (17) 


Since there are now hk different D,, ¢ has not now the simple interpretation of a ratio of two 
standard deviations which it had in previous cases. 





(4-4) The Latin square 
Here, if we have a k x k square, 


Sry = A+B,+C,+D,+24 (7 =1,...,&; 8 = 1,...,b; § = 1,...,&). 
Let B, and C, represent the row and column terms and D, the possible treatment effects. 
We assume 
r 8 
and the null hypothesis is that 


For the analysis 


8. = DU a —F..—F 5,.—% 4+ 2%_,)?, Ve = (k—1) (K—2). 


We find that Ao? = k > D? and 
t 





1 1 C : 
-—vzDt= |x Di + Ik (18) 


5. THE INTERPRETATION OF THE POWER FUNCTION 


It will be useful at this point to consider two examples: 

Example 1. Effect of machine variation on the standard deviation of the manufactured bulk. 
In a machine shop six machines contribute equal shares to the total output of mass-produced 
parts for which a dimension, 2, is subject to tolerances. Itis suspected that the average dimen- 
sion of the parts produced by the six machines may differ from machine to machine, thereby 
increasing the standard deviation of x for the bulk. It is further feared the+ if such an in- 
crease should exceed 20 % of the standard deviation of x for parts from a single machine 
previously employed, difficulties would arise in meeting the tolerances. 

In order to have advance warning of machine differences of such a magnitude, it is planned 
to measure nine parts for each of the six machines and test for machine differences in an 
analysis of variance, using the 1 % level of significance. What is the chance of detecting 
machine differences of such a magnitude by this test? 

It would seem reasonable to adopt the set-up (4:1). Let 4+, denote the average 
dimension of parts produced by the ¢th machine, 2, the dimension of the /th part measured for 
the tth machine and o the standard deviation of dimensions of parts from the same machine. 
Then it is desired to detect by the test the presence of differences among the B, such that 


V{j ZB7+o7}/o>1-2 (increase of 20 % in a), 
or that Jf DBF}/o > /(1-2?-1) = ./0-44 = 0-66, 
Hence ¢ = 0-66 ,/9 = 1-99 is the critical value. 











126 Charts of the power function for analysis of variance tests 


Entering the chart for vy, = 6—1 = 5 and the section for a = 0-01 (1% level of signi- 
ficance), we find for ¢ = 1-99 and v, = 6(9—1) = 48 the value of # = 0-86. This means that, 
following the procedure described, our chance of detecting machine differences of the above 
magnitude is 0-86, i.e. the odds are about 6 to 1 on detection. Should these not be considered 
sufficient, we must make provision for more parts to be measured, until a satisfactory value 
of £ is reached. 

Whilst the planning of the present experiment was concerned with the value of the root- 
mean-square of the group means B,, situations may of course arise in which the values of 
individual B, are of interest. 

Example 2. The effect of personal factors introduced by test operators in certain routine tests. 
Davies (1947, p. 90) has described an investigation into the possible influence of personal 
factors on the results of a standard procedure of testing the compressive strength of samples 
of Portland cement. Two classes of men are employed in the test: the ‘gauger’ who works 
up the mixture of cement and water from which the test cubes are moulded, and the assistant 
who operates the testing machine, who is described as a ‘breaker’. Let us suppose, as in the 
case of the data given, that there were three assistants regularly employed on the testing and 
three labourers who were used for the gauging. To investigate the possible effects of personal 
factors among the men, we may plan a simple 3 x 3 experiment in which n cubes are tested 
for each of the nine combinations of gauger and breaker. If we were concerned with six speci- 
fied men who were to be employed on this work in a routine way, it would seem reasonable to 
adopt the set-up of § (4-3) above. B,, B, and B, will represent the possible gauger effects and 
C,, C, and C, the breaker effects. In testing for the presence of either of these, we shall have 


yy = 2, ve = 9(n—-1), 


?p = V{nzB3/o,, 


and similaly for ¢,. If we were to decide to take n = 4, so that v, = 27, we can derive, for 
example, the following information from the chart for v, = 2: 

(a) The probability will be at least 0-90 of establishing significance for a gauger (or breaker) 
effect at the 5 % level of F if 


while from equation (16) 


$2216 orif VEE BR>-18o, = 0-620, 
$250? Tia 


(6) To have the same probability, using the 1 % significance level, we must have 
@>2-63 or fh x B?} > 0-760,. 
(c) If, however, the values of the men’s averages were less scattered, i.e. only such that 
VEZ BH} = O60, or $= ¥(12)0-5 = 1-73, 


then there would be probabilities of only (i) 0-72 and (ii) 0-45 of establishing significance at 
(i) the 5 % and (ii) the 1 % level. 
(d) Were the number of tests for each gauger-breaker combination increased from n = 4 
to n = 8, so that v, becomes 63, then, still using the same chart, we find that ./{} > B?} need 
8 


only be (i) 0-430, and (ii) 0-510, to give a probability of 0-90 of establishing significance at 
(i) the 5 % and (ii) the 1 % level. 





hat 





E. 8. Pearson anp H. O. HARTLEY 127 


This example brings out two points needing further consideration. In the first place the 
root-mean-square of the group-means B, is expressed in terms of o,, the standard deviation 
of the normal residuals. It will often happen that this is roughly known in advance of the 
experiment. For example, it might have been known in advance from past experience that 
if a breaker tests a series of cement cubes made from the same mix, the standard deviation 
of his results would be about 5001b./sq.in. Having this information, rough numerical figures 
could be assigned to the critical values for ./{4 2B3} quoted above. Thus for case (a) we have 


/{4 B32} = 0-62 x 500 = 3101b./sq.in. 


The second point is this. How far does the root-mean-square of the B, provide the kind of 
information we need in a preliminary survey? To generalize for the purpose of the present 
discussion, write the relevant factors B, C,, ..., etc. as 6; (j = 1,...,), where 


U8; = 0, 29 = c*g*os, (19) 


the constant c depending on the arrangement and being given for particular cases by (12), 
(14), (16), (17) and (18) above.* For a given value of the non-centrality parameter, whether 
expressed as A or ¢ or 46}, there will be an infinite set of possible combinations of the 4;. 
Can we pick out some characteristics of this set to which we can attach particular meaning? 

In some unpublished notes completed before he returned to China in 1938, Tang suggested 
the following approach. For every set of values of the 4; satisfying (19) there will be a value 
which is numerically greatest. This set of greatest numerical values must be bounded; he 
wrote the lower and upper bounds as follows: 


A, = the least upper deviation, 
Ag = the greatest upper deviation, 


ea cho] Jk k even, 
pit Sieaay | 1) kodd, (20) 


Ag = cha, J{(k—1)/k}. 


The meaning of this result he interpreted as follows: 

For given values of ¢ and a, (c being defined by the experimental arrangement) there must 
be at least one 6; whose numerical value is as great or greater than A,, but there can be no 
6; whose numerical value is greater than Ag. 

A, and A, are clearly not the only characteristics of interest. We are inclined to think that 
the range or spread of the 6; may be even more relevant. We will denote by W(max.) and 
W(min.) the upper and lower bounds of the ranges of the combination of 6’s satisfying (19) 
for fixed ¢ and o,. Then it can be shown that 


and showed that 


W(max.) = cho, x ./2, (21) 
. cdo, x 2/J/k k even, 
acre tis i x2 /{k/(2—1)} k odd. } (22) 


The maximum range occurs when the extreme 6; are + cda,,/,/2 and the remaining k — 2 zero. 


* This form of relation between the é’s and ¢ is appropriate for all the arrangements considered in § 4, 
but it will not always be so in tests of the general linear hypothesis. For example, if there are unequal 
frequencies in subgroups, the weighted sum of squares of the é’s is involved. 








128 Charts of the power function for analysis of variance tests 


The minimum range occurs, (i) if k is even, when $k of the 6; equal +c¢a,/./k and the re- 
maining $k of the 6, equal + cd¢a,/,/k; (ii) if k = 2p + 1 is odd, when p+ | of the 6; equal 


+ opor,{(k—1)/(k+ 1) k}, 
and the remaining p equal F cpo,{(k+ 1)/(k—1) k}. 


The interpretation is as follows. For given values of ¢ and a, the range of the 6;, i.e. the 
spread in the expectations of the z’s resulting from treatment differences, must be at least 
W(min.) but cannot exceed W(max.).* Alternatively, given W and o,, we can invert the 
relations (21) and (22) to obtain lower and upper bounds for ¢ and therefore for the power 
of the test. These points may be illustrated on the example considered on pp. 126-7 above. 
Here, for the differences between gaugers, 


3 1 
x B} sat - $02, 
s=1 n 


or the c of equations (19), (21) and (22) isc = 1/,/n. Suppose that having regard to the general 
programme of routine testing and the conclusions to be based upon it, it was considered that 
the maximum spread of what may be termed the gauger bias should not be more than 
W = 2501b./sq.in. Further, suppose again that our rough estimate of o, is 5001b./sq.in. 

(a) We might first ask if there is any prospect of establishing significance at the 5 % level 
when W = 250]b./sq.in. if a 3x 3 experimental arrangement were used with only n = 4 
replications. k = 3 and is odd, so from (22) we sée that 


¢(max.) = al “| (23) 


250 ‘4x8 2 
=2x500) 3 7 AF = 0-816. 


Turning to the chart for v, = 2, with v, = 27, it is clear that the value of ¢ is outside the 
range of both charts. Extrapolation suggests that there is a probability of only about 0-25 
of establishing significance at the 5 % level. Thus four replications would be quite inadequate 
to detect a spread of 2501b./sq.in. 

(6) As indicated under (a) on p. 126 above, for the probability of establishing significance 
at the 5 % level to be 0-90 or more, we must have ¢> 2-16. Taking W and c, as before we 


have from (21) 
. W 
¢(min.) = ss = on (24) 





Thus, to ensure that the probability of establishing significance at the 5 % level is 0-90 or 
more, we must make ,/n/(2 ,/2) > 2-16 or n > 37-3. Thus 38 replications would be needed. 

(c) Suppose now that we were content with a reasonable chance of establishing significance 
if the spread in gauger bias were W = 5001b./sq.in. From (23) and (24) we have 


¢(max.) = 0-816 fn, (min.) = 0-707 ./n. 


* Whilst for small k the upper and lower bounds W(max.) and W(min.) are not too far apart and may 
be used as indicated in the example below, their increasing divergence for the larger k may make the 
interpretation illustrated less useful. In such cases it may be possible to interpret the practical signifi- 
cance of a value of ¢ directly (asin Example 1) or to make some additional assumptions on the distribution 
of the 4;. 





-—™ ws © = tel 


he re- 
al 


i may 
<e the 
ignifi- 
ution 











E. S. Pearson anv H. O. HartTLey 129 


Taking n = 4 and examining the chart for v, = 2 again, we find that the probability 
of establishing significance at the 5% level lies between 0-66 and 0-52. The former will 
result if, say, B, = B, = 167 lb./sq.in. and B, = —333 lb./sq.in.; and the latter if 
B, = — B, = 250 lb./sq.in., B, = 0. 

(d) For the probability to be at least 0-90 the number of replications must satisfy the 
inequality n > (2-16/0-707)? = 9-3 or 10 replications would suffice. 


6. COMPUTATION FOR THE CHARTS 


The basic formula for the computation of the present charts was an approximation to the 
non-central F-distribution, defined by (5). Patnaik (1949) showed that, to a fair degree of 
accuracy, the distribution of non-central F’ based on v, and v, degrees of freedom could be 
represented by that of cF(v,v,), where F(v, v,) is the ordinary central F-ratio in which the 
degrees of freedom of the denominator of F’ have been retained, whilst the degrees of freedom 
(v) of the numerator are given by 


v= (Vv; + A)*/(, + 2A), 
and the scale factor c is given b 
af - c = (vy +A)/ry. 
Using this approximation and the familiar formula for the F-integral in terms of the Incom- 
lete B-function, viz. 
. f= I(bryW), 2 = %[(%4+¥F), 


values of £(¢, v;, ¥,) were computed from K. Pearson’s T'ables of the Incomplete Beta-function 
for a grid of values for ¢, v,, v, which included the rather wide grid in Tang’s table. At these 
latter grid points a comparison with Tang’s results provided a check on Patnaik’s approxi- 
mation. It was found, however, that in order to determine the correction required for the 
remaining values of the grid computed from Patnaik’s approximation, it was necessary to 
compute further exact values, particularly for large £, where our logarithmic scale necessi- 
tates high accuracy. For this purpose Tang’s recurrence formula was used for small v, 
(v, = 6-8), whilst for large v, a new formula for non-central x? provided by Mr F. W. J. 
Olver, of the National Physical Laboratory, proved helpful. With the help of this formula 
small tables of non-central y? were computed and ‘Studentized’ according to Hartley’s 
formulae (1944), in which the differentials of non-central x? were replaced by their finite- 
difference expansions. This method provided check answers for v, = 20, 30, 60 and oo. 
With the help of these further exact values a table of corrections, 


A(exact) — #(Patnaik’s approximation), 


could be prepared, which permitted rough interpolation and thus enabled the remaining 
values of the grid to be corrected to an accuracy sufficient for the purpose of plotting the 
present charts. Further plotting points were provided by the non-central F-table recently 
computed by Lehmer (1944) for # = 0-7 and 0-8. All values for v, = 1 were computed 
directly from Johnson & Welch’s (1940) table of non-central-t, and many of those for v, = 00 
from a table of non-central x? recently computed by Fix (1949). 


For the organization of the computational work we are again indebted to Mr T. Vickers 
of the Mathematics Division of the National Physical Laboratory, as well as to Miss J. H. 
Thompson of the Department of Statistics, University College. 

Biometrika 38 9 











130 Charts of the power function for analysis of variance tests 


The suggestion that Tang’s tables might be put into more convenient form for practical 
use was originally made to us by Dr G. P. Sillitto of Imperial Chemical Industries Ltd., 
and we are indebted to that firm for a substantial grant towards the cost of computation. 


REFERENCES 


Danrets, H. E. (1939). J.R. Statist. Soc. Suppl., 6, 186. 

Davies, O. L. (1947). Statistical Methods in Research and Production. Edinburgh: Oliver and Boyd. 
ExsENHART, C. (1947). Biometrics, 3, 1. 

FisHer, R. A. (1928). Proc. Roy. Soc. A, 121, 654. 

Fix, EVELYN (1949). Univ. Calif. Publ. Statist. 1, 15. 

Hartiey, H. O. (1944). Biometrika, 33, 173. 

Jounson, N. L. (1948). Biometrika, 35, 80. 

Jounson, N. L. & Wetou, B. L. (1940). Biometrika, 31, 362. 
Ko.opzresczyk, 8S. (1935). Biometrika, 27, 161. 

LEHMER, Emma (1944). Ann. Math. Statist. 15, 388. 

Lorp, E. (1950). Biometrika, 37, 64. 

NeyMan, J. & Toxarska, B. (1936). J. Amer. Statist. Ass. 31, 318. 
Patnalk, P. B. (1949). Biometrika, 36, 202. 

Tana, P. C. (1938). Statist. Res. Mem. 2, 126. 





actical 
Ltd., 
ation. 


Boyd. 





[ 131 ] 


SOME QUESTIONS OF DISTRIBUTION IN THE THEORY 
OF RANK CORRELATION 


By S. T. DAVID, M. G. KENDALL anp A. STUART 
Division of Research Techniques, London School of Economics 


SUMMARY 


1. This paper deals with the following matters: 

(a) We give the actual distribution of Spearman’s p, in the null case for rankings of 
9 and 10. 

(6) We give the moments and cumulants of Spearman’s p, in the null case up to those of 
the eighth order. 

(c) We use these cumulants and the cumulants of 7 recently given by Silverstone (1950) 
to develop an expansion of the frequency functions of p, and 7 in the null case. 

(d) We examine the. product-moment correlation between r, and ¢ in samples from 
a bivariate normal population. 


2. A valuable review of recent work on ranking methods has been given by Moran (1950) 
as part of a symposium to which Whitfield (1950) and Daniels (1950) contributed. These 
authors called attention to certain undesirable features of the notation of the subject, but 
as they all used different notations themselves and some of the contributors to the discussion 
used different notations again, we shall adhere to the notation of a previous paper by one of 
us (Kendall, 1949), that is to say: Spearman’s p will be denoted by p, (or r, if sample values 
are in question); the alternative coefficient will be denoted by 7 (or ¢ in sample values); and 
the correlation parameter in a bivariate normal population will be denoted by p. 


THE FREQUENCY FUNCTION OF p, 


3. The frequency distribution of p, in the null case (i.e. in the population of rankings in 
which each ranking occurs equally frequently) was given by Kendall and others in 1939 for 
n, the rank number, from 2 to 8 inclusive. Olds (1938) had independently given the dis- 
tributions for n = 2 to 7 inclusive. The méthod of obtaining the distributions used by both 
authors is essentially the same and consists of expanding a permanent. The work of explicit 
expansion rapidly increases as n becomes larger, and we have not found any methods of 
alleviating it other than those given by Kendall et al. in their 1939 paper. The permanent 
has now been expanded for n = 9 and n = 10, which is about as far as a computer’s patience 
can be expected to extend, and the resulting distributions are given in Table 1. The figures 
were checked by calculating the second and fourth moments, which are known as polynomials 
in n. Table 2 gives the corresponding single-tailed probabilities. 


4. The frequency polygons of the distributions exhibit the characteristic local irregularities 
commented upon by Kendall and others and, so far as they go, confirm their conjecture that 
as n becomes larger (and the distribution tends to normality) the tails of the distribution 
smooth out first. . 

We are indebted to Miss Joan Ayling, of the National Institute for Social and Economic 
Research, who helped in carrying out these calculations with her customary patience and 
accuracy. 


9-2 








132 





Some questions of distribution in the theory of rank correlation 
Table 1. Distribution of S(d) (for p,) in the null case for n = 9 and n = 10 









































Value of frecuency Frequency Frequency 
Values Values Values 
of S(d*) of S(d?) of S(d*) 
n=9 n=10 n=9 n=10 n=9 n=10 
0 1 1 60 2878 9892 120 6688* 38456 
2 8 9 62 2928 10678 122 — 38926 
4 21 28 64 3397 11647 124 — 39984 
6 34 51 66 3138 12141 126 —— 39386 
8 72 107 68 3647 13026 128 — 42068 
10 102 177 70 3568 13918 130 — 42848 
12 130 234 72 3921 14519 132 — 42424 
14 190 360 74 3866 15611 134 — 42925 
16 260 498 76 4311 16278 136 -- 45044 
18 284 619 78 4050 16780 138 — 44584 
20 398 819 80 4852 18686 140 os 46040 
22 454 1040 82 4492 19280 142 —_ 45496 
24 555 1252 84 4816 19586 144 — 46890 
26 616 1528 86 4784 20795 146 — 47036 
28 756 1824 88 5505 22385 148 — 47341 
30 744 2010 90 4954 22896 150 — 47646 
32 1022 2533 92 5638 23948 152 _ 48887 
34 1042 2837 94 5304 24970 154 — 48840 
36 1159 3180 96 5890 26012 156 — 48044 
38 1282 3676 98 5486 27096 158 -- 48540 
40 1555 4305 100 6188 28467 160 —_ 50066 
42 1392 4493 102 5502 28427 162 — 48970 
44 1719 5130 104 6436 30540 164 — 49062t+ 
46 1758 5672 106 5822 31272 166 — 49062t+ 
48 2009 6156 108 6233 31774 
50 2032 6909 110 6024 33264 
52 2282 7424 112 6697 34748 
54 2214 7830 114 5720 34499 
56 2676 8773 116 6672 36299 
58 2590 9392 118 6020 36596 
Totals (of whole distribution) 362880 3628800 














is as follows: 


* Modal voiue. Values of higher S are obtainable by symmetry. 
t+ Modal values. Values of higher S are obtainable by symmetry. 


MoMENTs AND CUMULANTS OF P, AND T 


5. Moran (1950) and independently Silverstone (1950) have given an expression for the 
cumulants of 7 in the null case. Silverstone’s form for S, which is related to 7 by the equation 


2 
(—1)*1xy, = 





k 


2k—1 


S 
4n(n — 1)’ 


By, & (j*—1) 
j=1 





k 


2%) 2k+1 


_ 2tk-1 B [= 4.1(%) 4 nok n| 


(1) 
(2) 








—_ ———— 








° 


| 


V—<~roe™ oe 


ie iw ee we 
+ 











the 
jion 


(1) 
(2) 








S. T. Davin, M. G. KenpaLt ann A. STUART 133 


Table 2. Probability that S(d?) (for p,) will be attained or exceeded for n = 9 and n = 10 























“ein Probability Re es Probability Values Probability | 
of of . of ' 

S(@) | n=9 a=10 | nee Ba n=9 | n=10 
0 1-00000 1-00000 60 0-9191 0-9755 120 0-509 0-786 
4 0-95724 0-9°724 62 0-9112 0-9728 122 0-491* 0-776 
4 0-94752 0-95724 64 0-9031 0-9698 124 — 0-765 
6 0-94173 0-94895 66 0-894 0-9666 126 — 0-754 
8 0-93824 0-9°755 68 0-885 0-9633 128 — 0-743 

10 0-9°625 0-91460 70 0-875 0-9597 130 — 0-732 
12 0-99344 0-9°897 72 0-865 0-9559 132 — 0-720 
14 0-92899 0-9°833 74 0-854 0-9519 134 — 0-708 
16 0:92846 0-9°734 76 0-844 0-9476 136 — 0-696 
18 0-92775 0-9°596 78 0-832 0-9431 138 — 0-684 
20 0-92696 0-9°426 80 0-821 0-9384 140 —_ 0-672 
33 0-92587 0-9°200 82 0-807 0-9333 142 — 0-659 
24 0-92462 0-97891 84 0-795 0-9280 144 —_— 0-646 
26 0-92309 0-92857 86 0-782 0-9226 146 — 0-633 
28 0-97139 0-92815 88 0-769 0-9169 148 — 0-621 
30 0-9893 0-97764 90 0-753 0-9107 150 — 0-607 
32 0-9873 0-97709 92 0-740 0-9044 152 — 0-594 
34 0-9844 0-97639 94 0-724 0-898 154 — 0-581 
36 0-9816 0-97561 96 0-710 0-891 156 — 0-567 
38 0-9784 0-92473 98 0-693 0-884 158 — 0-554 
40 0-9748 0-92372 100 0-678 0-876 160 — 0-541 
42 0-9706 0-97254 102 0-661 0-868 162 —_ 0-527 
44 0-9667 0-97130 104 0-646 0-861 164 — 0-514 
46 0-9620 0-9899 106 0-628 0-852 166 — 0-500T 
48 0-9571 0-9883 108 0-612 0-844 

50 0-9516 0-9866 110 0-595 0-835 

52 0-9460 0-9847 112 0-578 0-826 

54 0-9397 0-9827 114 0-560 0-816 

56 0-€336 0-9805 116 0-544 0-807 

58 0-9262 0-9781 118 0-526 0-797 





























* For greater values of S(d?) the probability is the complement of the probability corresponding to 
242 — S(d?), e.g. for S(d*) = 200 is 0-0294. 

+ For greater values of S(d?) the probability is the complement of the probability corresponding to 
332 — S(d?), e.g. for S(d?) = 200 is 0-280. 


where B,(n) is the jth Bernoulli polynomial in » and B; is the jth Bernoulli number. The 
actual values up to and including the cumulants of order eight (odd-order cumulants being 


zero) are 


Ky = pgn(n—1)(2n+5), (3) 
Kg = —ghgn(6n4 + 15n3 + 10n?- 31), (4) 
Ke = rang 2(6n® + 21n5 + 21n4— 7n?— 41), (5) 
kg = —gSgn(10n® + 45n7 + 60n® — 42n4 + 20n2— 93). (6) 


6. The second moment of p, in the null case was obtained by ‘Student’ about forty years 
ago, and the fourth moment was obtained by Hotelling & Pabst (1936). There does not appear 
to be any simple generating function for either moments or cumulants, and we have obtained 








134 Some questions of distribution in the theory of rank correlation 


the moments by ascertaining expected values in a manner which is exemplified by the 
following derivation of /1,. 
If x and y denote two rankings of n measured from their mean value }(n + 1), Spearman’s 
p, may be defined as Lay 
Pa = Ta(nt—l)’ (7) 
where x and y vary from — }(n— 1) to $(n — 1). The determination of the moments of p, about 
the mean is then equivalent to finding the expectations of powers of Zry. We have 


E(Xay)* = E(Uaty* + 42 a32 yy + 3a? yy? + 6La2rry* yy + Laerxxyyyy), 
where we write Lz*xa, for instance, to represent La}2z,x, summed over all unequal values 
of i, j, k from —}(n—1) to $(n—1). In terms of augmented symmetric functions (David 
& Kendall, 1949) we may write, since x and y vary independently in the null case 
EBXaxty%1 4% y% ... eer y%r 

= n= Pi) B(x y% ... x*ry*r) 

= [aPrafs... aPrP/nir0 (8) 
where there are p; superscriptsa,, and hence 
[4]? + 4[31}? : 3 [27]? + 6 (217)? 4 [14] 


E(2ay)*=—-+—@ n@ no * 2@° (9) 





We now express the symmetric functions in terms of monomial symmetrics, using for the 
purpose the tables of David & Kendall (1949), e.g. 
[4] = (4), 
— [27] = (4) —(2)?, ete. 
We also have by summing the appropriate sums of powers 
_ 3n?—7 n(n? —1) 
(4) = 99 gs 


(2) = ~. ete. 








On substitution in (9) we then find, after some purely algebraical reduction, 


(n — 1)n8(n + 1)? 
4 — i me 2 
E(=zxy) 125-100 (25n4 — 13n3 — 73n? + 37n + 72), 





3(25n — 38n?— 35n +72) 

25n(n — 1)8 (n + 1) 
The algebra rapidly becomes more complicated as the order of the moment increases. The 
calculation of the eighth moment involved numbers of thirteen digits. 





whence E(p*) = 


7. Proceeding in this way we find for the even-order moments of p, 
1 





Pa? 7 (10) 
_ 3(25n3 — 38n?— 35n +72) 
Ma ~~ Sin(n+1)(n—1” a 
3 





= gar mei 6 5 
Pe 245n*(n—1)®(n 41st 4,361n’ — 178n® + 23,818” 


— 22,783n* — 50,081n5 + 54,280n? + 44,160n — 28,800}, (12) 





a ee 





It 
































S. T. Davip, M. G. KENDALL anp A. STUART 135 é 
the = 3 13 12 
Me = STenbn IF wij 30 825m" — 218,050n 
an’s + 451,718n"! + 1,090,534n — 6,275,976n® + 2,142,858 
+ 30,402,746n7 — 27,330,110n* — 79,689,881n5 + 71,871,632n* 
(7) + 110,888,256n — 74,721,024n? — 51,867,648n + 40,642,560}. (13) 
aN From these results we have for the cumulants of p, 
1 
hoa pag (14) 
6(19n2 + 5n — 36) 
lues pai ~ 25n(n + 1) (n— 1)” (15) 
avid antl 48(583n° + 723n5 — 2,603n4 — 2,637n' + 4,054n? + 2,760n — 1 econ I (16) 
sali 245n3(n — 1)5(n + 1)8 
144 
= pa ... 9 8 7 
Ks = BTini na Tech — £12930n"9 — 83,709n* + 304,254n* + 578,442n 
(8) — 1,012,323n% — 1,690,125n5 + 1,800,776n4 + 2,358,048n5 
— 1,616,688n— 1,080,567n + 846,720}. (17) 
It may be noted that, in terms of x,, we — to the highest order in n, for 7, 
(9) 
Ky= 2 ~ — 2-16K3/n, (18) 
the ‘ ~aa08 : 
Ke = Gopi Kt © 26-45x3/n?, (19) 
za 
m= -™ (20) 
indicating a fairly rapid approach to ane For p, we have 
114 
k= — 35, 8~ — 4-56«3/n, (21) 
27,984 
ss 2 
Ke = Sapa ME — 114: 229/n?, (22) 
6,039,216 , 
Kg = — Sent AY — 6, 902«4/n3, (23} 
showing that the tendency is a good deal slower than for r. 
8. The values of the standardized cumulants for n = 10 are as follows: 
The | 
Ps T 
Ke 0-111,111 0-061,728,4 
| and — 0-464 —0-216,090 
10) k,/K3 1-139,526 0-257,254 
Kalks — 6-560,788 —0-732,829 
ny | : 














As far as terms of order n-* the Edgeworth expansion of a standardized symmetrical 
frequency function is Dt x«.D* x.D8 
exp [Fer +o +a \ate), (24) 





12) 














136 Some questions . distribution in the theory of rank correlation 


where D = Swit a(x) = e~#*, and this becomes 


da Tm 


Ky Ke Ki Kg , le oe oe ) 
f(z) = (145s 4 Dts (Be; ps4 4 Ds) + (ots52*+ +77,980°” + 32,944) |*™ 








balan (145 +3 H+ (ot+ i iis %)+ (aos 300 "+ 77, 7380 Hho + +a Ma)}. = 
where H, is the Tchebycheff-Hermite polynomial defined by 
(—Dy ala) = H,(a) a(2). 
The distribution function, which is more to the point for our purpose, is given by 


Fa) = [" (ar) da — ale) (5 tH, + (At. H,+ 45m) 


Ke, 4 Ki 
+(aa5 330% +Tpnts 4 aapaa An) te 


We shall need to consider values of x not greater than about 3. The explicit expressions for 
the H’s we require are 








H, = x — 3x, 

HA, = x5 — 1023 + 152, 

H, = x7 — 2125 + 10525 — 1052, , (27) 

H, = x® — 36x’ + 3782° — 1,26023 + 9452, 
H,, = 2" — 552° + 9902x? — 6,930x5 + 17,3252 — 10,395a. 





9. We now compare the values of the distribution function given by (26) with the exact 
values for n = 10 in the neighbourhood of the significance points 5, 1 and 0-1 %. 

A preliminary remark is necessary concerning corrections for continuity. In the usual test 
for the score S in the r-distribution the deviate is standardized by division by the standard 
deviation /{+,n(n— 1) (2n+5)} after being reduced by unity (cf. Kendall, 1948, p. 41). We 
improve on this by correcting the variance by a Sheppard correction, and similarly correct 
the other cumulants required in the computations. Thus, for n = 10, the variance of 8 is 125. 
With a Sheppard correction this becomes 124-6667. The deviate appropriate to a value of 
S of 19 is then 18/,/124-6667 = 1-612. The position is similar for Spearman’s p. The 
(corrected) cumulants of p, and 7 are: 








Ps T 
K,/K? —0-4641 —0-2172 
K/ke 1-1399 0-2593 
k,/x4 — 65636 —0-7406 

















These are the values used in applying equation (26), but, in fact, apart from the initial term, 
the uncorrected values of § 8 would have given the same final numerical results. 





eq 


su 
in 





Ee -—_ FF me be 


(25) 


(27) 


‘act 


best 
ard 
We 
‘ect 
265. 
» of 
The 


S. T. Davip, M. G. Kenpatt anv A. Stuart 137 


10. Consider 7 in the first place. For the complement.of the distribution function of 
equation (26) we have 


1—F(z) = f ” ele) dex + ax(2e) { ~ 0-0°905H, + 0-0°36H, 
4 +0-044H, — 0-0418H, — 0-0°3H,—0-0°1H,,}. (28) 
The following Table 3 shows the exact values of the probability, the deviate x) and the 


successive approximations P,, P,, P, and P, obtained by using one, two, four and seven terms 
in (28), i.e. by taking terms of order n°, n-1, n-? and n- respectively. 


Table 3. Approximations to the distribution function of 7 (case, n = 10) 





























Ss Xo P, P, Ps; P, Exact 
19 1-612 0-05348 0-05412 0-05416 0-05417 0-05416 
21 1-791 0-03665 0-03638 0-03632 0-03630 0-03628 
25 2-149 0-01582 0-01457 0-01438 0-01435 0-01430 
27 2-329 0-07993 0-07858 0-07838 0-07834 0-07833 
33 2-865 0-072085 0-071196 0-071117 - 0-071110 0-071106 
35 3-045 0-071160 0-0°491 0-0°459 0-08464 0-0°473 








The approximation P, is very good, considering that n is only 10 and that we are examining 
the tails of a discontinuous distribution. Even P, is very fair and would not lead to incorrect 
inferences in practice. For larger values of n the approximations would doubtless be closer. 

11. Similar calculations were carried out for p, for n = 10 with the results shown in 
Table 4. The series corresponding to (28) was 


1— F(x) = [- a(x) dx + a(x) { — 0-01933H, + 0-07158H, + 0-0°187H, 


— 0-07163H, — 6-043H, — 0-0512H,,}. 


Table 4. Approximations to the distribution function of p, (case,n = 10) 





























S(d?) Xp P, P, P, P, Exact 
256 1-636 0-05094 0-05201 0-05219 0-05225 0-05244 
258 1-673 0-04719 0-04783 0-0479% 0-04794 0-04814 
286 2-182 0-01456 0-01182 0-01096 0-01064 0-01012 
288 2-218 0-01327 0-01046 0-02959 002926 0-02870 
308 2-582 0-0491 002230 007160 009135 0-02109 
310 2-618 0-442 009189 0-07123 009998 008800 











It was to be expected that the agreement for p, would be less satisfactory than for 7. 
Nevertheless, the approximation represented by P, is good and would be quite satisfactory 
in practice. In fact, since S(d*) is discontinuous P, would indicate quite correctly which pair 
of values bracketed the significance levels. When it is remembered that we are dealing with 
the tails of a discontinuous distribution of finite range, the approximation seems remarkably 
good. 

THE RELATIONSHIP BETWEEN 1, AND ¢ IN NORMAL SAMPLES 


12. Inthe null case the distribution of r, and ¢ tends to bivariate normality for large n with 
a correlation which fairly rapidly tends to unity. On the basis of this result and the known 








138 Some questions of distribution in the theory of rank correlation 


expressions for the variances of r, and ¢ in the null case, Kendall has remarked that for 
moderate values of parental correlation r, in practice may be expected to be about 1-5t. 
Daniels (1950) has, however, quite rightly pointed out that it may be unjustifiable to 
generalize from the null case in this way if the parent is heterogeneous. He proves the 
interesting inequality for large n ~1<3t-2r, <1, (30) 
and shows that if ¢>0 the upper limit can be attained but not the lower limit, if <0 the 
reverse is true and if t = 0 both limits are attainable. He goes on to remark that in certain 
cases it is possible to have a population for which 7 = 0 but p, = + }, and hence that 7 and 
p, may be measuring different aspects of the population. 


12. That such a case is possible is undeniable; but in what might be called ‘respectable’ 
variation, namely, when the parent is capable of representation by measurable variates 
which are fairly close to joint normality, the correlation between r, and ¢ remains quite high. 
We shall establish this result by extending the method used by Kendall (1949) to obtain the 
variance of r, in normal samples. Incidentally, we extend his expansion as far as powers of 
p” and correct a small error in his coefficient of p*. 

In the usual notation we have 





1 

a a da (31) 

3 : 
in 2 
ro aE 8 

and in normal samples 

E(t) = E(a,;b,;) = = sin-p, (33) 
E(a36,,) = = sin tp, (34) 
Er.) = =a {sin tp + (n—2) sin Bp}. (35) 


For the variance of r, we have to order n- 
9 
varr, = — [— {H*(a;5bi4)} + Lai; bin Gin Dip) 


+ 4 (0556 pm Ppp) + 2E (Aj bgp Mm Oy.) + 2 (A455: mby)], (38) 
and for the variance of t, pp 


vart = a marl! -(- sin-tp) +2(n— 2) {5 — (Zain sin-! iv) }|. (37) 
or to order n-1 vart = ta (Fin ip) . (38) 


We then find, by the method described by Kendall (1949), 
s 2\*T/p\? 1/p\*. 8 (/p fy als) | 
1 wit P “fF fk i —— |= 9 
B(aybu) = (- -_ ie) (;) [(4) +3(8) +35(5) +35(5 +T576\o) | 8 


2\2n? 2/p\2 50/p\* 784 (p\* 55,112/p\® 26,050,816 ey 
H(A:sbedimPsy) = (- =) [Fe 3(4) +ar(4) tz a5) +Feaas (5) *31,000,725\2) |’ 


(40) 





36) 


37) 


38) 





S. T. Davin, M. G. Kenpaty anp A. STUART 139 


restos CY JSC) 3G) ol SG) BBE] oo 








(44536 :4¢ 4m Oy) = 2E(A4;b 24m Op); (42) 
2\2 2 5 2 379 6 869/p\® 222,287 /p\? ; 
E(a:50 Am by) = (-) | (6) +3(6) +300 (4) +550 (5) + Too awa (3) sia 


whence, on substitution in (36), 


varr, = (1 — 1-563,464,96p? + 0-304,742,82p4 + 0-155,285,69p° 


+ 0-061,551,66p8 + 0-022,099,22p"). (44) 
By expanding (38) we find also 


vart = Ps (1 — 0:911,890,65p2 — 0-075,990,89p4 — 0-010,132,12p8 
— 0-001,628,37p* — 0:000,289,49p"). (45) 


14. We now evaluate the covariance. We have 


cov (r,,t) = E(r,t) — E(r,) E(t) (46) 
4 (443543) + - ay; ~ bin 
and E(r,t) = B& 





Fala)” 
which is equivalent to 
$n*(n — 1)? (n+ 1) E(ret) = BX bij my bm, 
where the suffices 1,7}, m, k, 1 all range from 1 to n. To order n‘ the right-hand side gives terms 
of four types, namely, 


MOE (055544, Om) + NOE (055055 Amy Omy) + 4NOE (055655 On Omi) + 2nE (a4; ;;4;4 59); 
where suffices with different letters are now different. Thus, to order n-, 


3 
On substitution in (46) we find to order n-1 
3 
cov (r,t) = fs [4.8 (455; O94, 0g) + 2E (04; 5;5044.02) — 6B (04; 6,5) (Amy Ony))- (48) 
It remains to evaluate these expectations. In the manner followed by Kendall (1949) we find 
aa 1 p\? 40/p\* 368/p\* 17,824 /p\® eo (6) 
E(a:50:32mt Omi) = 7a | 12 (6) "|. (6) *75 (6 ) +315 (2 +727,675 (49) 


1 2 14 2 470 4 12,208 6 
E(4,56;;4;,bg) = =) 2 (6) +— (5) + (6) 








9° 3 \2) * 31 \2) + T2165 \2 
1,653,848 /p\® _ 1,680,524,032 A) 
+ 76,5465 (6) +~ 31,000,725 (6 ss 


Hence, on substitution in (48), we find 


2 
COV (r,t) = 5 (1 — 1-248,589,61p* + 0-068,304, 96! + 0-072,804,83p° 
" +0-040,255,26p° + 0-016,413,62p™). (51) 
To evaluate the correlation between r, and ¢ it is tempting to expand cov (r,t)/./(varr, var t) 
as a power series in p. The convergence of such an expansion, howevr, is less satisfactory than 


that of the three series for cov (r,t), varr, and var t separately, and it is best to evaluate them 
directly. The results for a range of values of p from 0 to 0-9 are shown in Table 5. 











140 Some questions of distribution in the theory of rank correlation 


Table 5. Correlation between r, and t in normal samples for large n 








var r, from (44) var t from (45) | cov 7,t from (51) correlation, 

pP 1/nx 4/9n x 2/3n x r, and ¢ 
0-0 1-000,000 1-000,000 1-000,000 1-000,000 
0-1 0-984,396 0-990,873 0-987,521 0-999,89 
0-2 0-937,959 0-963,402 0-950,170 0-999,55 
0-3 0-861,874 0-917,307 0-888,236 0-998,96 
0-4 0-758,326 0-852,109 0-802,301 0-998,07 
0-5 0-630,869 0-767,123 0-693,433 0-996,79 
0-6 0-485,060 0-661,369 0-563,490 0-994,87 
0-7 0-329,512 0-533,688 0-415,941 0-991,86 
0-8 0-177,612 0-382,303 0-256,482 0-984,28 
0-9 0-050,261,8 0-205,320 0-095,200,4 0-937,14 























The correlation, as will be seen, is very high until p itself approaches unity. In the neigh- 
bourhood of p = 1 the covariance of r, and ¢ tends to zero (because their variances tend to 
zero), but each coefficient, of course, will tend to unity. For large n, therefore, the ratios of 
the coefficients will not differ very much from the ratio of their expectations, namely, 
3sin—! $p/sin-!p, which varies from 1-5 in the neighbourhood of p = 0 to unity when p = 1. 
When p = 0-6 the ratio is 1-42. 


REFERENCES 


Dantets, H. E. (1950). J. R. Statist. Soc. B, 12, 171. 

Davin, F. N. & Kenpatt, M. G. (1949). Biometrika, 36, 431. 

Hore tune, H. & Passt, M. R. (1936). Ann. Math. Statist. 7, 29. 

KENDALL, M. G. (1948). Rank Correlation Methods. London: Charles Griffin and Co. 
KeEnpDaAtL, M. G. (1949). Biometrika, 36, 177. 

KENDALL, M. G. et al. (1939). Biometrika, 30, 251. 

Ops, E. G. (1938). Ann. Math. Statist. 9, 133. 

Morav, P. A. (1950). J. R. Statist. Soc. B, 12, 153. 

SttversToneE, H. (1950). Biometrika, 37, 231. 

WHITFIELD, J. W. (1950). J. R. Statist. Soc. B, 12, 163. 





N¢ 





[ 141 ] 


NOTE ON AN EXACT TREATMENT OF CONTINGENCY, GOODNESS 
OF FIT AND OTHER PROBLEMS OF SIGNIFICANCE 


By G. H. FREEMAN anp J. H. HALTON 


I. The present treatment was evolved because of the inaccuracy of the use of y* tables 
when expected and observed numbers are small. The method is exact and applies to any 
sampled array of numbers, whatever the population from which the sampling is effected, 
provided that: 

(a) either the parent population is infinite, or the sampling is done with replacement of 
the sampled members; 

(b) the sampling is random. 

The treatment may also be used to determine the significance of effects whose nature is 
such that Pearson’s x? test is not applicable. 

The only difficulty encountered in the use of the method outlined below is the amount of 
labour involved in obtaining the results, and this sets an upper limit to the size of the sample 
that can be dealt with in a reasonable time, using hand-operated or semi-automatic machines. 


II. Consider any population whose members may be subjected to a complete and exclusive 
classification into a finite number, r, of groups. Then the probability that out of a random 
sample of nm members, /; will fall into the ith group (i = 1, 2, ...,r) is 


where p, is the probability that a randomly selected member will belong to the ith group. 
Putting A; = np,, this becomes AE 
Pi = (in) il (is ) ~ 


where A, is the number of members of the sample expected to fall into the ith group. 

In addition, should the choice of the sample considered, although otherwise random, be 
restricted by a number of conditions C,, C,, ...,C,, whose a priori probabilities are respectively 
Po,» Po, «++» Po, the expression (1) must be divided by their product, giving 


P,= (5) a (75)/ I ep. , (2) 


III. Similarly, if a population is subjected to k different and independent classifications, 
the corresponding probability of obtaining an array L(I;,;,__;,) by random sampling is 


rem (2) fh fh» f/f ° 


For most purposes, only relationships internal to the contingency table formed for the 
sample considered are of interest. In such cases, the numbers of members of the sample in 
the ith group of the mth classification (i,, = 1,2,...,7,,; m = 1,2,...,k), i.e. the ‘border 
totals’, may be taken as fixed. If this is done, and if further: 


rT; Ts; Tm-1 Tm. 


Onin = EE Sy 2, les nteh 


i,=1 i,=1 ye im+i=1 











142 Exact treatment of contingency and goodness of fit 
it is found, on substituting the a priori probability of obtaining arrays A,,(a,,;,,) for the 





errors TT new!) — | (Ut TH. TL Oe) 
Bie (=) m= fan a G=tGe1 Goi " (4) 
! vr; vs; a 
ge. 5 » Th laa. a!) TL Ti (asi) 
4,=174,=1 n= lin=1 


Finally, if it is assumed that the population is EE I wea an assumption which is 
equivalent to 





IT (Qmim) 
Ne ign te = a. , (5) 
then, on substitution into (4), it is found that 
k fn 
II II (Gin!) 
P,, = =a ~ : (6) 


(n!)1 T] TL... TL Casts! 


4,=1%,=1 = 


IV. A special case of importance is that in which k = 2. Putting i for 1,, j for i,, r for r,, 
8 for r2, a, for a,,, and b; for a,,,, the expressions (4) and (6) become respectively 


e TH) 1) { TT as) 





(7) 
| (ae) I (op) 


— 
= 
~-~ 
~ 
= 
— 
tay. 


and P, = =_+! ‘ (8) 





When r = s = 2 we get the known result for the exact treatment of 2 x 2 tables (see, for 
instance, R. A. Fisher’s Statistical Methods for Research Workers, §21-02). 

Further, the k-variate contingency table, when assumed to be homogeneous, may be 
looked upon from the point of view of the heterogeneity due to one classification only, say 
the Ath. In that case, the contingency table may be looked upon as a bivariate contingency 
table, the two classifications being the Ath classification on the one hand, and the (k—1) 
other classifications on the other hand. 

Putting Oo ty. tecrtars-@ ™ 5 (Ls, tg... te)> 
it will be seen that ,,_ ,, ea <9 = 
II II eee II Il ° Tt (dé .. eve th-1thi1 te!) TT (Qn:,!) 


P _ t4,=1i,=1 in-i=1 thyi=l izg=1 
= oo 


mn TL TE - TL Cas..e!) 


4,=1%1,=1 in=1 





(9) 


V. The actual test is carried out as follows: 
(i) All arrays X%(2,, _;,), subject to the same general conditions as the observed array 
L(l;, i... 4)» ate written out. 
(ii) The corresponding a priori probabilities Pyw are calculated by means of the appro- 
priate probability expression. 
(iii) The values of Py« satisfying the following equation are noted: 
Pxw <P, (10) 





ene ee ee 








the 


(4) 


h is 


(5) 


(6) 


rf,, 


(7) 


(8) 


, for 
y be 
, Say 


ency 
;—1) 


(9) 


Wray 


ppro- 


(10) 








——~ 








G. H. FREEMAN AND J. H. Hatton 143 


These are the probabilities of all arrays X which are a priori as probable as, or less probable 
than, the array L. 
(iv) The probabilities satisfying (10), which include P, itself, are added together to iit 
a probability ¢,, the summation being over values of ¢ satisfying (10), thus 


PL= ~ (Pxw). (11) 


It will be seen that the resulting @, is the total a priori probability of obtaining an array 
as probable (individually) as, or less probable than, the observed array L. 

Now @, satisfies the following conditions: 

(a) @, has one and only one value for every array L; 

(6) @, has its maximum value, unity, when L is as close tu the expected array A as is 
allowed by the condition that the observed numbers are integers; 

(c) @,> Py, if (but not only if) every observed number in L is as close to the corresponding 
expected number in A as is the corresponding number in L’, or closer. 

We may thus conclude that the magnitude of ¢, is a measure of the goodness of fit of the 
observed array L to the expected array A. 

The x? tables normally used involve the assumption that — 


P, = Ce*e (12) 


(where C is a constant and (x*), is the ‘x?’ calculated for the array L), which holds with 
increasing accuracy as observed and expected numbers become larger. Now, all arrays such 
that (10) holds are arrays such that (2) cw > (x). (13) 
Thus the probability obtained from x? tables is an approximation to @,, which becomes closer 
as observed and expected numbers become larger. 


VI. When the test is carried out in practice it is found that none of the operations mentioned 
in §V above is very lengthy, as the following two examples show. The working of these 
examples is done more fully than would be the case in practice, in order to clarify the 
exposition. The first example is of a (2 x 3) table and the second of a (2 x 2 x 2) table, both 
being assumed homogeneous. 

(i) Thesample consists of fifty candidates who had passed through a training establishment 
having previously passed successfully through a selection centre. The candidates are 
classified by selection centre grade A, B, or C, by training establishment grade X or Y and 
also by an educational factor M or N. (Totals are given in parentheses, totals of totals in 
brackets and the overall figure is in black.) 


















































Cc| B\A c|B|A cineia 
X| 0} 3]2] (|X| 8 | 4] 01] (2)]xX] (s)| @/ | 171 
Y|6j|51|1 |(12)} ¥] 16 | 5 | O | (21) ] ¥ | (22) | (lo) | (1) | [33] 
(6) | (8) | (3) | [17] (24) | (9) | (0) | [33] [80] | [17] | [3] | 50 
Education M Education N Overall 


























144 Exact treatment of contingency and goodness of fit 


The first step is to test each of the above three bivariate distributions for heterogeneity. 
This is done by means of the x? test in the second and third tables, in the latter pooling the 
A and B groups as this has little effect on the heterogeneity. The corresponding values of 
x? are 0-3492 (D.F. = 1) and 1-7974 (D.F. = 1), respectively, giving 0-50<P<0-70 and 
0:10< P< 0-20. 

In the first table no pooling will make the x? tables reliable as the numbers are too small 
and so the present method is resorted to. The table is assumed to be homogeneous, so that 
expression (8) is the appropriate one. The procedure is described below and follows the lines 
indicated in §V. The first step is to write out all the arrays subject to the same general 
conditions as L, the observed array. In following the method used in this example it is best 
to do this in full, but in the alternative method used for the second example here this is 
unnecessary, for reasons given in §VI (ii) below. The arrays, 18 in number, of which the 
second is the observed array L, are written out as follows: 


x0(5 : »F L= x0( : i), x0(3 : 9 x0() : +5 
xo(} : i) xo(* : i): x0 : 2): xe: : «J 
x0! : ni xn0( | ; 2 xa(t - mF xae(t : > 
oe ee AEE a i i i) 
mii). tig. 


The ordering of the arrays above is done on the following principle. In any (r, x r, x ... x r;) 
contingency table in which only the border totals, and hence the overall total, are fixed there 
are f degrees of freedom where b . 

zy = aus (7m) — > | 1)-1. 


f independent cells are then picked out and placed in arbitrary order. The ordered set of 
f numbers obtained is then regarded as an f-figure ‘number’ and all possible ‘numbers’ are 
written out in order of size. In this example the figures in the CX and BX cells, in that order, 
form the ‘number’, f being 2. 

The second siep is the calculation of the separate probabilities Py«, and this is done here 
by breaking down each Py into two factors Q; and Ryw, where Q, is independent of t. 





Thus Pxyw = Q,~Rxw, 
TI (a,!) TT (6;!) 
and since Pxw = = - is) : 
m! TT TT (2!) 
i=1j=1 
r 8 
II (@,!) IT (0;!) 
we have a,.-—= f=! 





n! 








ee 





a ee 








G. H. FREEMAN anv J. H. Hatton 145 
and Ryw = TT TI (2). 
i=1j=1 
By the use of logarithms the calculations are then performed, Q, being determined once 


for all and each Ryw separately. In this example the values of Py are then found as follows, 
that of P; = Py being given in black. 








Px| 0-0045 | 0-0271 | 0-0339 | 0-0090 | 0-0078 | 0-0815 | 0-1629 | 0-0679 | 0-0024 








Px} 0-0582 | 0-2037 | 0-1357 | 0-0097 | 0-0776 | 0-0905 | 0-0073 | 0-0194 | 0-0010 






































18 
As a check, all the Pw are added together to give >) Pyw = 10001, the error being easily 
t=1 


covered by the corrections to four decimal places in the working. 

The values of ¢ satisfying inequality (10) are then noted, these being ¢ = 1, 2, 4, 5, 9, 13, 16, 
17, 18. For these values of ¢ the values of Py« are summed to give @;. 

Thus 9, = 0-0882 and so, at the 5% level, there is no evidence of heterogeneity in the 
distribution of selection centre and training establishment grades in the case of candidates 
with educational factor M, in the sample considered. 

Since in the case of these candidates the probability 9, is not far removed from the 
significance level 0-05, we may test whether there is evidence of positive association between 
the two sets of grades in this sample, regarding X as better than Y, and A as better than B, 
B being better than C. Of the arrays set out above, those with positive regression coefficients 
are those where? = 1 to 11 and 13. Of these, those with ¢ = 1, 2, 4, 5, 9and 13 satisfy inequality 
(10). 

Note. For a general (2 x 3) bivariate table, with the arbitrary values shown here, the 
regression coefficients are positive if: 








(0+4)-@rp>S- IS) 
a b c A +1 
bss eo Libres -1 























In the case of the array considered above, the right-hand side of this inequality has the value 
(—3)(—7) 
172 

(c+d)—(a+f)>1. 
Biometrika 38 10 


, or 0-0727. Thus the arrays with positive regression coefficients are those for which 











146 Exact treatment of contingency and goodness of fit 
Considering only arrays with positive regression coefficients we have, writing FP, for Pxw: 


P+P+Pt+PtBt+Ps_ 0-060 0. 
me = 5.6686 = 970905. 





P, L, positive = 


Thus there is no significant evidence of association between the grades given to candidates 
with educational factor M at the selection centre and at the training establishment, in this 
sample. 

This test for evidence of association is one which cannot be performed by means of x? but 
which can be readily done by the present method. 

(ii) The second example considered consists of the attainments of a set of 25 children. These 
are classified according to year of birth (1935/6 and 1937/8/9), attendances at school during 
a particular fortnight (up to 7, and 8 or more) and possession of a certain certificate or not. 
The distribution of the classifications is given below, totals as before being in parentheses, 
totals of totals in brackets and the overall total being in black. 


























Tota] number of attendances 
Up to 7 8 or more Total 
Year of birth Year of birth Year of birth 
1935/6|1937/8/9 1935/6) 1937/8/9 1935/6] 1937/8/9 
Cert. 0 0 (0) | Cert. 1 2 (3) | Cert. | (1) (2) [3] 
Not 1 4 (5) | Not 1 16 (17) | Not (2) (20) | [22] 
(1) (4) | [5] (2) | (18) | [20] [3] | [22] | 25 












































The array is considered as a (2 x 2 x 2) trivariate contingency table. The total number of 
children in each group of each ciassification is assumed to be fixed and the array tested for 
general homogeneity or heterogeneity, the appropriate expression being expression (6) with 
k = 3,r, = 2. In the method described in this example not all the arrays X” obtained as before 
need to be written out but only certain key ones, on principles given below. 

On the same system of ordering as used in the first example, f = 4 and the four cells the 
numbers in which form the four-figure ‘number’ are all but the bottom right-hand cell in 
the first table above, first the top row from left to right, then the bottom Jeft-nand one, and 
finally the top left-hand cell of the second table. On this system X® and X® are respectively : 


0 0|j0 3 Ga 3) 


@) 
xo() sls 14 0 5/2 15)" 





) a 


The expression (6) for the probability may be here written as: 


Product of all (border totals!) 
(n!)* x product of all (cell totals!) ° 





Pxrw = 





—_— 














G. H. FREEMAN anv J. H. Hatton 147 


On this principle we have: 
5! 20! 3! 22! 3! 22! 








Fx = (sie O1 01 Of! O13! 3114! 
_ 8x17x19 
~ 23 x 23 x 25’ 
5! 20! 3! 22! 3! 22! 
Px - 


(25!) 01 0! 0! 5! 1! 21 21 15! 


0!3!3! 14! (3 x 3) 


= Peo Tyorerisl ~ x (1x 15)’ 


The principle of this last calculation is clear. Given any Pycy, if the difference between 
corresponding cell-totals of X“) and X“) is either one or zero, the value of Px«, is the value 
of Px«) multiplied by the greater of differing pairs of cell-totals if they occur in X“ and 
divided by them if they occur in X“, 

If Pt) = PD) x d(t,, t.), we see that the factor ¢(t,, t,) is subject to the following conditions: 

(i) Every ¢(t,,t,) consists of a fraction of which the numerator and denominator contain 
the same number of integral factors, in general two. : 

(ii) If the arrays are numbered systematically in the order explained above, then whenever 
both the pairs of arrays X“ and X®, and X® and X*, satisfy the required conditions for 
X,, and X,, we have: 

(a) Each integer in the numerator of ¢(¢—1,t) is greater by one than the corresponding 
integer in the numerator of ¢(t,¢+ 1). 

(6) Each integer in the denominator of ¢(¢—1,t) is less by one than the corresponding 
integer in the denominator of ¢(¢,¢+ 1). 

(iii) If condition (ii) is not satisfied, there is a break in the sequence of factors ¢. 

It will thus be seen that not all the arrays need be written out. If X® and X® are written 
out and also those values of X® and X“ for which ¢(t’, t) does not follow from ¢(¢”,t—1), 
irrespective of t’ and t”, which occurs when ¢(t”,t—1) has unity as a factor in its numerator, 
no other arrays need be written out; for in all other cases the values of 4(¢,t + 1) can be deduced 
from those of the preceding ¢(t— 1, t). By this method a substantial saving of time and labour 
is effected and these rules are followed in writing out the arrays below. 





xa) . : be 3 xa() 4 —- nah x0() as = 
b= x0) fia) (3 alias 2% lov): 

x00() ; : 1): xa() ; ” oa xae() AP fy ] 

ca ee ne mG sh) 

x2 alors) (5 ilo as) 20 alse) (0 lz a7) 

ett ala ac)> FCC ain ag)) 72 alta) *Le a/0 0) 














148 Exact treatment of contingency and goodness of fit 











ae 23.3) 240 IE) a Sg). ame Ie) 
Sa ies He s(t $0) 
aot fh). 0h 9183). aml IS). ze a8. 
a A es en a es 
AY ce A ie 
a A iY es Ge i A i 





(In this particular example most of the arrays need to be written out because of the extremely 
small size of some of the border totals. The gain in time is greater as these increase, indeed had 
these rules been adopted in the case of the first example considered, only 11 of the 18 arrays 
there would have been written out.) 

At the expense of not having a check by summing all the probabilities to make unity, 
a considerable further saving of time can be made. In order to determine @,, only those 
values of Px such that Pyw <P, need be found. Now if P, is known and an approximate 
calculation shows that some Pxw is greater than P,, or if both Pyw) > P, and ¢(t’,t) >1, 
then there is no need to calculate Pxw. 





5! 20! 3! 22! 3! 22! 9x 17x19 
Now P, = Px = Gspeorolil4l dl 2iiliél ~ 9323x100 ~ 0051: 
(1x 16 
Pyw = Pro Seay > Pxw, 
1x15 
Py = Pw aH = Px) > Pxw. 


It is thus easily seen that the values of t when Pyw > P, aret = 1, 2, 5, 11, 14. Thus @, is 
the sum of Pxw for all values of t except 1, 2, 5, 11, 14. However, since in this example there 
are only 5 values of ¢ not in 9, as opposed to 44 in @, it is obviously simpler to calculate 
1—@,, particularly since Py, is already known. 

We find 1—@, = 0-6353 and so @, = 0-3647. There is thus no significant evidence of 
heterogeneity in the distribution. 

(iv) An even more powerful though less frequently applicable way of eliminating calcula- 
tion is the following, which could be used in the second of the examples considered here. 
Let there be N possible arrays X and let the level of significance worked to be a. 


Then if P;,< Wo the total probability @,, must be less than a; and if P,, >a, the total 


probability @, must be greater than a. In either case, the significance of the effect being 


tested is already determined and no more probabilities need be worked out, unless ths value 
of , is needed. 











iis 
ere 
ate 


» of 


ila- 
pre. 


tal 
ing 
lue 








———— 





G. H. FREEMAN AND J. H. Hatton 149 


VII. The method described above is generally of use in casés where x” would be used were 
not certain of the observed and expected numbers too small. However, it may also be used 
where y? is wholly unsuitable, such a case being that described at the end of § VI (i). If this 
case alone had been considered, then the calculations, not lengthy in the whole example, 
would have been very much shortened as not all of the probabilities would have needed to 
be worked out. 

The main use of the method is however in cases such as that described in the first part of 
§VI(i). It happens very frequently, particularly in cases where selection and training units 
are considered, that it is wished to compare their assessments of candidates when, under the 
conditions studied, the whole population contains so few members that x? is inapplicable and 
still a test of significance is required. It was in order to study such cases that this technique 
was devised, and when the method is used as often as the frequency of occurrence of such 
cases demands it can be reduced to an almost mechanical simplicity by means of the devices 
used in §VI. The method is very easy to handle and all of the calculation is of a simple and 
straightforward character. 











[ 150 ] 


THE GEOMETRY OF ESTIMATION 
By J. DURBIN anp M. G. KENDALL 


INTRODUCTION 


1. The theory of statistical estimation is usually expounded by algebraic and analytical 
methods. Apart from the use of a multi-dimensional space by Neyman & Pearson (1928) in 
connexion with fitting theoretical distributions, by Hotelling (1930) and Fisher (1938) in 
a discussion of maximum likelihood estimators, and apart from a remark by Huzurbazar 
(1949) concerning the radius of curvature of a likelihood curve, there seem to have been few 
attempts to consider estimation as a geometrical problem. In the following paper we shall 
show how many of the familiar results in the theory of estimation can be represented simply 
and directly as properties of linear spaces, and that results so apparently different as Gauss’s* 
theorem on least squares and Fisher’s theorem concerning the minimal variance of maximum 
likelihood estimators in the limit are really aspects of one fundamental fact. 

In the ultimate analysis geometrical ‘proofs’ in more than three dimensions are only 
restatements of analytical results in a special language; but they are nevertheless very 
useful, partly because of their elegance and partly because they carry a greater degree of 
conviction and understanding, to some minds at least, than the analytical approach. They 
also suggest generalizations and we shall, in fact, apply them to generalize Gauss’s theorem 

. to the case of the simultaneous estimation of several parameters. 


LINEAR ESTIMATION AND GAUSS’S THEOREM 


2. Consider a set of n random variables 2,,...,2, which are independent and have the 
same mean / and variance o*. It is required to estimate the mean by a linear estimator 
n 
t= Zt (1) 
We impose the condition that this shall be unbiased, i.e. that 
DA; = 1. (2) 
Throughout the whole of this paper we shall be concerned with one aspect or another of 
linear unbiased estimators of this type. 

Consider now a Euclidean [n] space with co-ordinates A,, ...,A,,, which we call the estimator 
space. The hyperplane (2) corresponds to the range of values of A giving unbiased estimators 
and any point P in it determines just one estimator. Now the variance of the estimator is 
o* Aj and hence is c?OP? where O is the origin. It follows that this is a minimum when P is 
the foot of the perpendicular from O on to the hyperplane. Symmetry alone is enough to 
show that the values of the A’s are then all equal. Moreover, the geometrical approach shows 
at once that the value is a true minimum, not merely a stationary value, a point which the 
analytical or algebraical approach does not bring out immediately. 

If P’ is a point corresponding to any other estimator t’ we have 


vart OP? 
vart’ = OP? = cos? d, say, (3) 


* The theorem is often attributed to Markoff but in all its essentials is due to Gauss. See the 
historical note by R. L. Plackett, Biometrika, 1949, 36, 458. 








the 








J. DuRBIN AND M. G. KENDALL 151 


where ¢ is the angle POP’. But cos ¢ is the correlation between ¢ and ¢’ and thus, if Z is the 
efficiency of t’ we have vart 
p(t, t’) = J JE. (4) 


vart’ 
A general proof of this result is given by Cramér (1946). 
We notice also from the geometry of the situation that if the A’s of ¢’ are near the optimum 
values, namely those of t, the efficiency of ¢’ is nearly unity. This is important for the practical 
utility of the result when we generalize it to the case of unequal variances below. 


3. Consider now a second Euclidean [n] space corresponding to the x’s, which we call the 
observation space. The bilinear form &A;z,; establishes a duality between this and the 
estimator space: for any fixed t, to a point in one there corresponds a hyperplane in the other. 
In particular, to the hyperplane (2) there corresponds the point (t,t, ...,¢), lying on the line 
1 = Lp = ...%,_. Moreover, if a vector through the origin in one space is perpendicular to 
a hyperplane in that space, the corresponding hyperplane and vector in the other space are 
also orthogonal. If ¢ is allowed to vary, then to a point in one space there will correspond 
a family of parallel hyperplanes in the other. 

It follows that the unbiased estimator of minimum variance in the observation space is 
given by the family of hyperplanes orthogonal to the line x, = x, = ... z,. Thus if Q is the 
point (2, ...%,) the optimum estimate is given by dropping a perpendicular on to this line, 
meeting it at L, say. Without loss of generality we may take the origin of the observation 
space, say X, to be the mean, and the estimate is then XZ. This procedure is equivalent to 
finding L such that QL? is a minimum. In other language we choose the estimate so as to 


n 
minimize > (x;—t)*, ie. take the least-squares solution for ¢. 
j=1 


This is Gauss’s theorem in its simplest form. We minimize a sum of squares and emerge 
with an optimum estimator, namely one which minimizes another sum of squares (the 
variance). From the geometrical viewpoint this arises simply from the duality between two 
spaces established by the bilinear form of the type of estimator we are considering. 


4. Wecan now proceed to generalize the simple form. Suppose the x’s come from different 
populations with variances a? which are known except for some constant factor k? (i.e. we 
know each k?o?) and with means yw, = fa; where the a,’s are known and our object is to 
estimate f. (This is the model for a regression through the origin.) We first of all transform 
to new variables 

=, 5 

g: ko; ( ) 
The £’s now have equal variances and means fa;,/ko; say fc; where the c,’s are known. 
Geometrically this means a contraction of the variate scales. Taking now 

t= ZA;E; (6) 
as an estimator of # we see from the foregoing argument that the optimum unbiased estimator 
is given in the estimator space by the vector through the origin perpendicular to the 
hyperplane DA,c; = 1, (7) 
that is to say, by choosing the A’s to be proportional to the c’s. The optimum is thus given by 

b = Xe, 8,/2c} 


A,2 az 
= Bet / oe (8) 











152 The geometry of estimation 
In the observation space the estimator is given by the family of planes orthogonal to the line 


fo 
x efcart (9) 
that is to say, by minimizing the sum 
. _ ~ (%—Aa,\* 
E foots & (EE) (10) 


which leads to the same formula (8). 


5. Before proceeding to further generalizations we introduce an alternative geometrical 
representation for the case just considered, except that we assume the variates already 
standardized by division by ko,. This involves no loss of generality. 

The observation vector x and the estimator vector A can be represented by the vectors 


OX and OL in a single Euclidean [n] space. Let 7’ be the foot of the perpendicular from X to 
OL. Then t = OT/OL (see Fig. 1). 








Fig. 1 


Let OM and OA represent the vectors p and a. M must lie on OA since py; = fa;. The 
least-squares estimator is the value of b which minimizes the sum of squares X(x,—ba,)*. 
Let OB represent ba. Then &(x,—ba,;)* = XB. Thus B is the projection of X on OA. Also 
b=OB/OA. MX represents the vector €= x—fa. ¢,,...,€, are uncorrelated random 
variables with zero means and constant variance o?. Any orthogonal transformation of them 
preserves distances and inner products, and so gives new variables that are uncorrelated with 
zero means and the same variance. Choose such a transformation in which one of the new 
axes coincides with OA. Then MB is the corresponding new variable. Thus MB has zero 
mean and variance o?, i.e. OB has mean OM and variance o?, i.e. b has mean f and variance 
o?/OA?, 

Let N be the projection of M on OL. Make a new orthogonal transformation such that one 
axis coincides with OL. The corresponding new variable is N7' which, as before, has zero 
mean and variance a”. Thus OT has mean ON and variance o?, i.e. (OL has mean f OA cos¢ 


and variance 0? where ¢ is the angle MON. Thus if t is an unbiased estimator of # we must 
have OA cos ¢ = OL in which case 


o* _ varb 
OA*cos?¢ cos? d" 
This is the result obtained by the earlier method. As before cos ¢ = corr (6, t). 


var? = 














ee ce ee Sl ee ee, a ee ee ee 


e line 


(9) 
(10) 


trical 
ready 


. The 
ba;)?. 

Also 
ndom 
them 
| with 
> new 
; ZETO 
iance 


wt one 
} zero 
cos ¢ 
must 








J. Dursrn anv M. G. KENDALL 153 


6. This approach gives another explanation of why the least-squares estimator has least 
variance. Any linear estimator can be obtained as a multiple, say k, of the projection of the 
sample vector upon a fixed vector. The mean of the estimator is simply & times the projection 
of the centroid of the distribution upon the fixed vector. For comparative purposes we can 
confine ourselves to fixed vectors of given length. As the fixed vector departs from the vector 
passing through the centroid of the distribution the mean of the sample projection upon it 
diminishes. Thus if the estimator is to be unbiased the value of k must be increased. This 
has the effect of increasing the variance of the estimator. It can be seen that k is least when 
the fixed vector passes through the centroid of the distribution. The projection upon this 
vector gives the least-squares estimator. Thus the least-squares estimator is also the best 
linear unbiased estimator. 


GAUSS’S THEOREM ON THE ESTIMATION OF A LINEAR FUNCTION 
OF SEVERAL PARAMETERS 


7. Ifthe variates have different variances we assume them standardized in the manner of 
the preceding sections. A further generalization of Gauss’s theorem then concerns the estima- 
tion of some linear function of parameters # which in turn are linearly connected with the 


k 
means of the variates. The set-up is the same as in §4, except that u; = > f;a,,(¢ = 1, 2,...,n), 
j=1 
or in matrix notation p = AB where A is a known matrix. The problem is to estimate the 
k 
form 6 = ¥)d;f;, i.e. d’® where the d’s are known constants, the /’s being unknown. The 
I 

theorem states that the estimator obtained by substituting in the expression for 6 the least- 
squares estimators of /,, £2, ..., 4, is the best linear unbiased estimator of 6d. 

8. Duality principle. By the usual methods we find that the least-squares estimator of 
é is given by d = d’(A’A)“1A'x. (11) 
This is a bilinear form which represents an [n — 1] hyperplane in the observation space which 
is perpendicular to the vector d’(A’A)-1 A’. 

n 
Suppose t = >) A;z; is any estimator of 6. If this is unbiased we must have 
1 
E(t) = AB = 8 = dB 
identically, if ¢ is to be unbiased for all values of the £’s. Thus we must have 
WA =d’. (12) 

Now vart = o?XA? 
is a minimum subject to (12) if the vector A in the estimator space is perpendicular to the 
[n —k] hyperplane specified by (12). The condition for this is that A’ = d’(A’A)-1A’. 

Once again the estimators are the same since the process of minimizing the sum of squares 
in the observation space is the dual of the process of minimizing the variance in the estimator 
space. Both processes involve minimizing the distance from a point to a subspace and there- 


fore give rise to an orthogonal partition of the space. 
As before if ¢ is any unbiased linear estimator 


var b 


rt = ——,, 
ba cos? ¢ 











154 The geometry of estimation 


where ¢ is the angle between the two vectors of coefficients in the estimator space. Also 
cos ¢ = corr (b,t). Equation (4) is therefore generally true. 


9. Alternative representation. For simplicity we shall consider the case k = 2, though the 
arguments are essentially quite general. Let OX, OM, OA, and OA, represent the vectors 
X, #, a, and a,, and along OA, and OA, take OM, = £,0A, and OM, = £,0A, respectively. 
Let B be the projection of X on the [2] space spanned by OA, and OA, with OB, BB, 
a quadrilateral as shown in Fig. 2. Then 6, = oa and 6, = Laie are the least squares 
1 


OA, 
estimators of £, and /,. 








Fig. 2 


As before, MX represents the vector e = x—p. By making a linear transformation of the 
e’s such that two of the new axes coincide with A, and A,, while the remaining axes are 
orthogonal to these, we see that the means of OB, and OB, are OM, and OM, respectively. 
Thus 6, and 6, are unbiased. Consequently, the least-squares estimator d = d,b, + d,5, is an 
unbiased estimator of 5 = d,f,+d,£z. 

Suppose ¢ = XA,z; is any unbiased linear estimator of 6. As before var t is least when the 
vector A passes through the centroid M. The length 


OM = ,(fi0A? + £230A3 + 28,8,0A,0A, cos yp), 
where y is the angle A,OA,. Thus the estimator of 6 with least variance is 


é n 
OM = (81444 + Ba%o4) 2. 


This estimator is of no use in practice, however, since it requires a knowledge of £, and £, 
which are, of course, unknown. What we require is the best among those unbiased estimators 
whose coefficients do not depend upon /, and /,. 

+ As before, let OL represent the vector A corresponding to any estimator ¢ = XA,x, and let 
OT’, ON be the projections of X, M on OL. Then OT has mean ON and variance o?. Now 
ON = £,0A, cos 0, + £,0A, cos 0,, where 0, is the angle LOA, and @, is the angle LOA,. We 
require as an estimator of d some multiple, say k, of OT’. If this is to be unbiased we must 
have k(f,0A, cos 0,+ £,0A, cos 6,) = d,8,+4,f,. If, in addition, ¢ is to be independent 




















J. DURBIN AND M. G. KENDALL 155 


of £, and £, we must have Caen? = ay which is a constant. Thus OL is restricted to move 
OA,cos@, d, 
in an n—1 space orthogonal to the [2] space spanned by OA, and OA,,. 

Of the lines through the origin in this space the one giving the minimum variance estimator 
is the line that passes closest to M. Clearly this is the intersection of the space with OA, Ag. 
There is only one such intersection and this is the line in the space OA,A, that gives an 
unbiased estimator, that is, it is the line giving the least-squares estimator since this is known 
to be unbiased. Consequently, the least-squares estimator is the best linear unbiased 
estimator. 

As before, if ¢ = XA;x,; is any other estimator var? = (vard/cos?¢), where ¢ is the angle 
between the two vectors of coefficients. Thus ¢ is the angle between the vector A and the 
space OA, Ag, that is, cos¢ is the multiple correlation between A and a, and a,. 

In §15 we present a generalization of Gauss’s theorem to the case when several parameters 
are simultaneously under estimate. 


10. Correlated variates. The case when the 2’s are not independent has been dealt with 
very elegantly from the algebraical stand-point by Aitken (1935). Geometrically, the duality 
given by the orthogonal properties in the independent case is replaced by a more complex 
position depending on angles. For instance, in the simple case of § 2, the optimum point in the 
hyperplane is given by the point of contact P with the ellipsoid 


where p;; is the covariance of x, and x;. To this point there corresponds a family of parallel 
planes in the observation space making an angle with the line z, = x, = ... x, equal to the 
angle between OP and the hyperplane in the estimator space. 

The use of such results depends, however, on a knowledge of the covariance matrix p,; 
(except for a multiplicative constant) and if the matrix is in fact known it is simpler to 
‘orthogonalize’ the problem by transforming to new variates which are independent. This 
amounts to linear deformations of the spaces; and as such transformations lave linear 
subspaces invariant the general results for the case of independence are linearly trans- 
formable to the case of dependence. In the latter case, instead of minimizing a sum of 
squares we minimize a quadratic form in order to give the optimum unbiased estimator. 


11. Estimates of the variance. In the simple case of §2 the variance of the optimum 
estimator ¢ is given immediately as OP? namely o?/n. In the more general case of §7 it is 
o*ZA?, where 4’ = d’(A’A)-!A’. What is more usually required is an estimator of these 
quantities and the second part of Gauss’s theorem, in its extended form, states that an 
unbiased estimator is given by 

M. XA? (14) 
n—-k 
where S is the minimum value of the sum of squares used in arriving at the A’s. This will 
follow if we show that S/(n — k) is an unbiased estimator of a, which is readily seen to be so 
from the consideration that S is the square of the perpendicular from a point in the observation 
space on to an [n — k] space. 











156 The geometry of estimation 


MAXIMUM-LIKELIHOOD ESTIMATORS 


12. We now consider a third geometrical representation. We regard the parent frequency 
distribution as divided into m groups with relative frequencies, say, a,,...,4,,. For a sample 
of n the observed proportional frequencies in each group, say, 7, ...,”,,, Will fluctuate about 
the a’s subject to the condition 


ie 


n= 1. (15) 


We consider a Euclidean [m] space in which the co-ordinates are 7,,...,”,,. The ‘parent’ 
point, say P, has co-ordinates a,,...,a,,. The sample points will cluster round P and as 
n becomes large they become a globular cluster for which the variables 


N,—- A; 
Ly, = 16 
4 la, (16) 


are normally distributed in the [m— 1] space a defined by 2z;./a; = 0 with unit variance 
and zero mean. A representation of this kind was considered by Neyman & Pearson (1928), 
and later by Hotelling (1930) and Fisher (1938). 

Now for any value of 0, a parameter of the parent, there will be a corresponding point Q 
and the locus of such point provides a curve which, following Neyman & Pearson we call the 
Population Locus. P itself lies on this curve. The principle of maximum likelihood requires 
that if S is the sample point we estimate 0 by dropping a perpendicular in a from S on to this 
curve, meeting it at 7’, say. The estimate of @ is the value corresponding to the point 7’. 
To put it another way, the estimator ¢ of @ is defined by the family of hyperplanes in « 
normal to the population locus through the sample points. 





13. In the limit the sample points are clustered very closely round P and the population 
locus in the neighbourhood of P is linear. Also the variation of @ as a function of distance 
along the curve tends to linearity. The maximum-likelihood estimator is therefore given by 
hyperplanes normal to a straight line. On transforming to variates which are independent 
and have equal variance, it follows from Gauss’s theorem as proved in § 2 that the maximum- 
likelihood estimator has minimum variance among all linear unbiased estimators, that is 
to say, among all estimators which are based on linear functions of the relative frequencies 
and are consistent. This is Fisher’s theorem. 


14, Consider now the case when several parameters are simultaneously under estimate. 
Instead of a population curve we shall have a population surface of k dimensions, where k is 
the number of parameters. The maximum likelihood estimator is given by dropping per- 
pendiculars on to this surface, which again for large samples is flat in the effective range of 
variation. To anticipate the argument in the following section, this is equivalent to minimizing 
the volume of an ellipsoid or, in the dual form, to minimizing the volume of a hyperpyramid. 
Both these are equivalent in turn to minimizing the determinant of the covariance matrix. 
Hence we get Geary’s theorem (1942): the maximum likelihood estimators in large samples 
minimize the generalized variance 

V = |covt,,t;|, (17) 


where ¢, is the estimator of 6, and cov (t,,t;) = vart,. 











J. DURBIN AND M. G. KENDALL 157 


SIMULTANEOUS LINEAR SSTIMATION 


15. To illustrate the power of the geometrical method still further we shall generalize 
Gauss’s theorem to the case when several parameters (or linear functions of them) are 
simultaneously under estimate. Suppose that we have the same model as in §9, i.e. y is 
a vector of uncorrelated standardized variates with means given by E(y) = A®. We require 
to estimate the p linear functions of the s’s given by § = DB, where & = {6,, ...,6,} and D is 
a known p x k matrix of rank <k. Our generalized form of Gauss’s theorem states that the 
linear unbiased estimators which have least generalized variance are obtained by substituting 
the least-squares estimators of /,, £2, ..., 4, in the expression for 5. 

An algebraic proof of this proposition has been given for the case D =I by Aitken (1948). 
The following derivation may, however, be found easier to grasp since Aitken’s treatment 
requires rather sophisticated algebra. 

We shall first prove the result for the case D = I, the unit matrix, that is, we shall show 
that the least-squares estimator b = {6,, ..., b,} of 8 has minimum generalized variance in the 
class of unbiased estimators of B. 

First we note that the generalized variance of a set of variates z with covariance matrix 
V is proportional to the square of the volume of the ellipsoid z'Vz = constant. This ellipsoid 
is sometimes called the ellipsoid of concentration; it may be constructed for the least-squares 
estimators in the following way. 

Using the same geometrical representation as in §9, except that the number k of columns 
of A is unrestricted, let « be the space spanned by OA,,...,OA,. Then 6,,...,6, are constant 
multiples of the co-ordinates of B, the orthogonal projection of X on «, referred to the oblique 
axes OA,,...,0A,. For simplicity suppose that each of these multiples is unity. The 
orthogonal projection upon a of an [n] sphere C, whose centre is the centroid M of the 
distribution of the y’s, is a [k] sphere, S say. The sphere S referred to the axes OA,,..., OA, 
gives the ellipsoid of concentration of 6,, ..., b,. 

Let t = Ay be any linear estimator of 8. This matrix equation determines an [n — k] space, 
all points of which give the same value of t and which intersects « in a point 7”. The co- 
ordinates of 7” relative to the axes OA,, ...,OA, evidently depend linearly upon the elements 
t,,...,¢, of t. As t varies these co-ordinates have means equal to the co-ordinates of the 
centroid of the distribution of the y’s, that is they have means /,,...,4,. Since t has been 
defined to have mean 8, t,, ..., ¢, must therefore be the co-ordinates of 7”. 

Every point Y in the space t = Ay gives the same point 7’. As Y moves over the sphere 
C, T” traces out a [k] ellipsoid, say H, in a. H referred to the axes OA,,...,OA, gives the 
ellipsoid of concentration of t. 

It follows that the sphere S must either coincide with HZ or must lie inside it. Consequently 
the ellipsoid of concentration of b must either coincide with that of t or lie inside it. We 
conclude that the generalized variance of t cannot be less than that of b. 

The general case of estimation of the linear forms § = D8 can be dealt with in a similar 
way though the argument is more complicated. First it must be shown that the least- 
squares estimates d = {d,,...,d,} can be obtained as the co-ordinates relative to certain 
oblique axes of the projection of the sample point on the [p] space spanned by the axes. 
Any other estimator t = Ay represents an [n—p] space which meets the [p] space in a 
point 7”. By considering the conditions for unbiasedness it can be shown that ¢,,...,t, 
are the oblique co-ordinates of 7’. As before, the ellipsoid of concentration of d coincides 








158 The geometry of estimation 


with that of t or lies inside it. Hence the generalized variance of t cannot be less than 
that of d. 

From the statement of the theorem we also have the notable result that minimization 
of the covariance determinant is equivalent to minimization of each variance separately, 
subject to the condition of unbiasedness, and also to the minimization of any principal 
minor of the determinant. 


REFERENCES 


ArrKEn, A. C. (1935). Proc. Roy. Soc. Edinb. 55, 42. 

Arrxsn, A. C. (1948). Proc. Roy. Soc. Edinb. 62, 369. 

Cramer, H. (1946). Mathematical Methods of Statistics, Princeton University Press. 
FisHer, R. A. (1938). Theory of Statistical Estimation, University of Calcutta. 
Geary, R. C. (1942). J.R. Statist. Soc. 105, 213. 

Horetuine, H. (1930). Trans. Amer. Math. Soc. 32, 847. 

HuzurBazar, V. S. (1949). Biometrika, 36, 71. 

Neyman, J. & Pearson, E. S. (1928). Biometrika, A 20, 175, 263. 














[ 159 ] 


TESTING FOR SERIAL CORRELATION IN LEAST SQUARES 
REGRESSION. II 


By J. DURBIN,* London School of Economics 
AND 
G. S. WATSON, Department of Applied Economics, University of Cambridge 


1. INTRODUCTION 


In an earlier paper (Durbin & Watson, 1950) the authors investigated the problem of testing 
the error terms of a regression model for serial correlation. Test criteria were put forward, 
their moments calculated, and bounds to their distribution functions were obtained. In the 
present paper these bounds are tabulated and their use in practice is described. For cases in 
which the bounds do not settle the question of significance an approximate method is sug- 
gested. Expressions are given for the mean and variance of a test statistic for one- and two-way 
classifications and polynomial trends, leading to approximate tests for these cases. The 
procedures described should be capable of application by the practical worker without 
reference to the earlier paper (hereinafter referred to as Part I). 

It should be emphasized that the tests described in this paper apply only to regression 
models in which the independent variables can be regarded as ‘fixed variables’. They do not, 
therefore, apply to autoregressive schemes and similar models in which lagged values of the 
dependent variable occur as independent variables. 


2. THE BOUNDS TEST 
Throughout the paper the procedures suggested will be illustrated by numerical examples. 
We begin by considering some data from a demand analysis study. 

Example 1. Annual consumption of spirits from 1870 to 1938. The data (given in Table 1) 
were compiled by A. R. Prest, to whose paper (1949) reference should be made for details 
of the source material. As is common in econometric work the original observations were 
transformed by taking logarithms: 

y = log consumption of spirits per head; 

x, = log real income per head; 

X_ = log relative price of spirits (i.e. price of spirits deflated by a cost-of-living index). 

We suppose that the observations satisfy the regression model 


Y = Bot Ayr + fet +€, (1) 
where f, is a constant, £, is the income elasticity, f, is the price elasticity, and ¢ is a random 
error with zero mean and constant variance. 


To test the errors for serial correlation the following sums of squares and products are 
required: 


Xu(y-y)* = 656-000123 LX(y—y%)(x,—%,) =—3-763579 X(Az,)? = 0-083559 
X(a,—%,)? = 0°632006 X(2,—Z%,)(xy—Z%,) = 1-014984 LTAyAz, = 0-014685 
X(a_—Z,)? = 2-966354 (Ay)? = 0112592 XAyAz, = —0-076399 
X(y—9) (v,—%,) = —1-321973 L(Az,)* = 0-023539 XAz,Az,= 0-000527 


* Junior research worker at the Department of Applied Economics, Cambridge, when this work was 


begun. 











160 Testing for serial correlation in: 3t squares regression. II 


x (Ay)? stands for the sum of squares of the first differences of the y’s, and LAy Az stands for 
the sum of products of the first differences of the y’s times the corresponding first differences 
of the z’s, etc. Thus the first term in Z(Ay)? is (1-9794 — 1-9565)? = 0-00052441, and the first 
term in LAyAz, is (1-9794— 1-9565) (1-7766 — 1-7669) = 0-00022213. Since there are 69 
observations there are 69 terms in each of the first six summations, and 68 in each of the 
second six summations. 


Table 1. Annual consumption of spirits from 1870 to 1938 





Year |Consump-} Income Price Year |Consump-| Income Price 
tion y a Xs tion y wy Ly 





1870 1-9565 1-7669 1-9176 1905 1-9139 1-9924 1-9952 
1871 1-9794 1-7766 1-9059 1906 1-9091 2-0117 1-9905 
1872 2-0120 1-7764 1-8798 1907 1-9139 2-0204 1-9813 
1873 2-0449 1-7942 1-8727 1908 1-8886 2-0018 1-9905 
1874 2-0561 1-8156 | ~ 18984 1909 1-7945 2-0038 1-9859 
1875 2-0678 1-8083 1-9137 1910 1-7644 2-0099 2-0518 
1876 2-0561 1-8083 1-9176 1911 1-7817 2-0174 2-0474 
1877 2-0428 1-8067 1-9176 1912 1-7784 2-0279 2-0341 
1878 2-0290 1-8166 1-9420 1913 1-7945 2-0359 2-0255 
1879 1-9980 1-8041 1-9547 1914 1-7888 2-0216 2-0341 
1880 1-9884 1-8053 1-9379 1915 1-8751 1-9896 1-9445 
1881 1-9835 1-8242 1-9462 1916 1-7853 1-9843 1-9939 
1882 1-9773 1-8395 1-9504 1917. 1-6075 1-9764 2-2082 
1883 1-9748 1-8464 1-9504 1918 1-5185 1-9965 2-2700 
1884 1-9629 1-8492 1-9723 1919 1-6513 2-0652 2-2430 
1885 1-9396 1-8668 2-0000 1920 1-6247 2-0369 2-2567 
1886 1-9309 1-8783 2-0097 1921 1-5391 1-9723 2-2988 
1887 1-9271 1-8914 2-0146 1922 1-4922 1-9797 2-3723 
1888 1-9239 1-9166 2-0146 1923 1-4606 2-0136 2-4105 
1889 1-9414 1-9363 2-0097 1924 1-4551 2-0165 2-4081 
1890 1-9685 1-9548 2-0097 1925 1-4425 2-0213 2-4081 
1891 1-9727 1-9453 2-0097 1926 1-4023 2-0206 2-4367 
1892 1-9736 1-9292 2-0048 1927 1-3991 2-0563 2-4284 
1893 1-9499 1-9209 2-0097 1928 1-3798 2-0579 2-4310 
1894 1-9432 1-9510 2-0296 1929 1-3782 2-0649 2-4363 
1895 1-9569 1-9776 2-0399 1930 1-3366 2-0582 2-4552 
1896 1-9647 1-9814 2-0399 1931 1-3026 2-0517 2-4838 
1897 1-9710 1-9819 2-0296 1932 1-2592 2-0491 2-4958 
1898 1-9719 1-9828 2-0146 1933 1-2635 2-0766 2-5048 
1899 1-9956 2-0076 2-0245 1934 1-2549 2-0890 2-5017 
1900 2-0000 2-0000 2-0000 1935 1-2527 2-1059 2-4958 
1901 1-9904 1-9939 2-0048 1936 1-2763 2-1205 2-4838 
1902 1-9752 1-9933 2-0048 1937 1-2906 2-1205 2-4636 
1903 1-9494 1-9797 2-0000 1938 1-2721 1-1182 2-4580 
1904 1-9332 1-9772 1-9952 
































The regression coefficients are calculated by inverting the matrix 


X(x, —%,)? U(x, — 2) (%_—Zq) 
U(x, — %,) (%,—Zq) X(x,—%,)? 
giving for the estimates of f, and £, 
b, . 2-966354 — 1-014984 — 1-321973 
[| - pe 1014984 veal feed} 
b 


1 = —0-120142, 5b, = —1-227647. 


i.e. 











(2 
co 


tr 
tk 








J. DURBIN AnD G. S. Watson 161 
Let z denote the residual from regression, i.e. 
z= y—Y¥—5,(x,—%,) —b,(x,—Z,). 
Then U2? = XU(y—y)? —b, U(y—¥) (e —%,) — bg U(y — 9) (Xp — Ze) 
= 0-22095. 
The statistic to be used for testix.g for serial correlation is 


2 
e=er (2) 





The reasons for choosing this statistic have been given in Part I and need not be discussed 
here. Now 
Az = Ay—b, Az, —6, Azz, 
so that 
(Az)? = Z(Ay)? + 62 X(Az,)? + b3d(Ax,)? — 2b, LAy Ax, — 2b, DAy Ax, + 2b, b, LAx, Ax, 
= 0-054967. 


Substituting in (2) we have d = 0-2488. 

We must now decide what departures from the null hypothesis of serial independence of 
the errors € need be considered. Experience with econometric data such as the present 
indicates a test against the existence of positive serial correlation. If the errors were posi- 
tively serially correlated, d would tend to be relatively small, while if the errors were nega- 
tively serially correlated d would tend to be large. We therefore require a critical value of d, 
say d*, such that if the observed value of d is less than d* we may infer that positive serial 
correlation is established at the significance level concerned. 

It was shown in Part I that exact critical values of this kind cannot be obtained. However, 
it is possible to calculate upper and lower bounds to the critical values. These are denoted by 
d,, and d,. If the observed d is less than d, we conclude that the value is significant, while 
if the observed d is greater than d,, we conclude that the value is not significant at the signi- 
ficance level concerned. If d lies between d, and dy the test is inconclusive. 

Significance points of d; and dy are tabulated for various levels in Tables 4, 5 and 6. In 
addition, a diagram is given to facilitate the test procedure in the most usual case of a test 
against positive serial correlation at the 5 % level (Fig. 1). &’ is the number of independent 
variables. 

In the present example n = 69 andk’ = 2,sothat at the 5 % leveld, = 1-54 approximately. 
The observed value 0-25 is less than this and therefore indicates significant positive serial 
correlation at the 5 % level. In fact, the observed value is also significant at the | % level. 

The procedure for other values of k’ is exactly similar. In all cases the value of d given by 
(2) is calculated, the z’s being the residuals from regression, and the appropriate table is 
consulted. 


Tests against negative serial correlation and two-sided tests 


Tests against negative serial correlation may sometimes be required. For instance, it is 

a common practice in econometric work to analyse the first differences of the observations 

rather than the observations themselves, on the ground that the serial correlation of the 

transformed errors is likely to be less than that of the original errors. We may wish to ensure 

that the transformation has not overcorrected, thus introducing negative serial correlation 
Biometrika 38 Pe 











162 Testing for serial correlation in least squares regression. II 


into the transformed errors. To make a test against negative serial correlation, d is calculated 
as above and subtracted from 4. The quantity 4—d may now be treated as though it were the 
value of a d-statistic to be tested for positive serial correlation. Thus if 4—d is less than d,, 
there is significant evidence of negative serial correlation, and if 4—d is greater than dy, 
there is not significant evidence; otherwise the test is inconclusive. 


eae a ON 8 oe eee oe 





20 — 
q 











15 aie ahh 


T T 
\ 

\ 
i i 


Lar 
‘ 
l 


, 
i 











ES EO a, aA Ae SP ae 


20 30 40 50 60 70 80 #0 100 n 
Fig. 1. Graphs of 5% values of d; and dy against n for k’=1, 2, 3, 4, 5. 


When there is no prior knowledge of the sign of the serial correlation, two-sided tests may 
be made by combining single-tail tests. Using only equal tails, d will be significant at level 
a if either dis less than d, or 4 — dis less than d,, non-significant if d lies between d,, and 4—d, 
and inconclusive otherwise; the level a may be 2, 5 and 10 %. Thus, by using the 5 % values 
of d; and d, from Table 4, a two-sided test at the 10 % level is obtained. 


3. REGRESSION THROUGH THE ORIGIN 


The procedures described so far apply only to cases in which means have been fitted in the 
regressions, i.e, the fitted regression equations take the form 


Y—¥ = b,(2,—%,) +... + dy (Zp —Zy). (3) 

















—_ 2 sew ft 


* = 8&8 #4 


the 


(3) 














J. DURBIN AND G. S. WaTSON 163 


These are the most common cases in practice. However, we occasionally require a fitted 
regression through the origin of the form 


y = B,z,+...+ Byry. _ (4) 


Tables 4, 5 and 6 do not apply directly to the residuals from regressions of this type. To test 
for serial correlation in such cases an equation of the form (3) must first be fitted. The residuals 
from the resulting regression may then be tested by the procedures of §§ 2 and 4. This gives 
a perfectly valid test even though (4) might be the more appropriate regression to fit for other 
purposes. 

In order to avoid inverting more than one matrix the following method, due to Cochran 
(1938), should be used. First of all, the regression coefficients of equation (4) are determined. 
In this operation the inverse of the matrix of squares and cross-products of 2, ..., 2, will be 
calculated; denote this matrix by C = {c;;} and the means of the variables y, x,, ..., 2, by 


7, %, ...,Zy. Then 
b B, xy 
sla} elit 
byl = LBy Zyl 


k’ 

x Be -9 
where By = => . 
ij=1 





4. APPROXIMATE PROCEDURE WHEN THE BOUNDS TEST IS INCONCLUSIVE 


No satisfactory procedure of general application seems to be available for cases in which the 
bounds test is inconclusive. However, an approximate test can be made, and this should be 
sufficiently accurate if the number of degrees of freedom is large enough, say greater than 40. 
For smaller numbers this test can only be regarded as giving a rough indication. 

The method used is to transform d so that its range of variation is approximately from 0 
to 1 and to fit a Beta distribution with the same mean and variance. The mean and variance 
of d vary according to the values of the independent variables, so the first step is to calculate 
them for the particular case concerned. The method of calculation will be illustrated by 
means of the data of Example 1, although in practice an approximate test would not be 
required for this case since the bounds test has given a definite answer. 

The description of the computing procedure is greatly facilitated by the introduction of 
matrix notation. Thus the set {y,, Yo, ...,y,} of observations of the independent variable is 
denoted by the column vector y. In the same way the set 


E omy see “ 
Tn Van ees Ten 
of observations of the independent variables is denoted by the matrix X. We suppose that 
all these observations are measured from the sample means. The corresponding sets of 


first differences of the observations are denoted by Ay and AX. The numerator of (2) is the 


quadratic form (Az)' (Az) = 2’Az, 


II-2 











164 Testing for serial correlation in least squares regression. II 


where A is the real symmetric matrix 





de oo a ORG eee 
=) ed 
Ser 6S 

qo 

uaa 2-1 

ue €:.04) 58 





The moments of d are obtained by calculating the traces of certain matrices. The trace of 
a square matrix is simply the sum of the elements in the leading diagonal. For example, the 
trace of A, denoted by tr A, is 2(n—1), where n is the number of rows or columns in A. 

It was shown in Part I that the mean and variance of d are given by 





P 
E(d) = ms aie Ee (5) 
2 
where P = trA—tr{X’AX(X’X)“}}, (7) 
and Q = tr A?— 2tr {X’A2X(X’X)—4}-+ tr [{X’AX(X’X)-}}?]. (8) 


The elements of (X’X)-! will have been obtained for the calculation of the regression coeffi- 
cients, and the elements of X’ AX for the calculation of d; for X’AX = (AX)’ (AX), so that the 
(i, j)th element of X’AX is simply the sum of products 2Ax,;Az;. Thus the only new matrix 
requiring calculation is X’A?X. Now X’A?X is very nearly equal to (A?X)’ (AX), where 
A*X represents the matrix of second differences of the independent variables. Thus the 
(i, j)th element of X’A*X will usually be given sufficiently closely by &(A?x,) (A?x;), where 
A*z, stands for the second difference of the ith independent variable. (More exactly 


(X’A?X),; = Z(A*z;) (A225) + (Gq — Liq) (% jy — Ljq) + (Lin — Lin) (Ljn—y — Vjn)-) 


The calculations will be exemplified by the data of Example 1. Referring to § 2 we see that 


X’AX 


(X’X)-? = 


[0-023539 0-000527 
~ £0-000527 0-083559_]’ 


[ 2-966354 — 1-014984 
— 1014984  0-632006 


il 


Although tr X’AX(X’X)~' is simply the sum of the two diagonal elements of the product of 
these two matrices, we shall need the remaining two elements below. so the whole matrix 


is computed giving 


0-069290 —0-023559 
X’AX(X’X)-1 = | |. 
—0-083248  0-052275 
Thus tr X’AX(X’X)— = 0-069290 + 0-052275 


= 0-121565. 











TM 


ti 


KW 


Th 





fF 


) 


) 


- Bo Ow & 8 











J. DURBIN AND G. 8S. Watson 165 


Substituting in (7) and (5) and remembering that tr A = 2(n-— 1) = 136 in the present case, 


we have 
E(d) = a = 2-05876. 


The matrix of sums of squares and products of second differences is found to be 
0-035867 —0-004495 
X’A2X = [ |. 

—0-004495 0-116368 


tr X’A?X(X’X)-! is obtained by multiplying the first column of (X’X)-" into the first row 
of X’A?X and adding the product of the second column of (X’X)-' into the second row of 
X’A*X giving tr X’A?X(X’X)-! = 0-189064. tr [{X’AX(X’X)-1}*] is simply the sum of 
squares of the elements of the matrix X’AX(X’X)-! i.e. 0-0150189. Also 


tr A? = 2(3n—4) = 406. 
Substituting in (8) and (6) we have 


66 x 68 
= 0-0561033. 
We now assume that }d is distributed in the Beta. distribution with density 


Reals) (1-3) - 


vard = Fay Fan (405-636891 — 135-878435 x 2-05876) 





This distribution gives E(d) = 4 
pt+q 
16pq 
vard = , 
(p+q)?(p+q+1) 


from which we find p and q by the equations 


E(a){4—-E@)}_ 
vard GS (9) 
p = t(p+q) Ed). 

To test against positive serial correlation we require the critical value of }d at the lower tail 
of the distribution. 1f 2p and 2¢ are integers, this can be obtained from Catherine Thompson’s 
tables (1941), or indirectly from tables of the variance ratio or Fisher’s z, such as those in the 
Fisher- Yates tables (1948); if 2p and 2q are not both integers, a first approximation may be 
p(4—d) 

qd 
and z = $log, F is Fisher’s z, both with n, = 2q, n, = 2p degrees of freedom. 

For moderately large numbers of observations a convenient way of finding the significance 
point when 2p and 2gq are not integral is to use Carter’s (1947) approximation to Fisher’s z. 
This states that the critical value of z is approximately 


evEtA) -S)(a 5s 
h 2q «2p, F 

i I 2 2-3 
— Bp * 29° aft A= é 


ptq= 


found using the nearest integral values. Thus F = is distributed as the variance ratio 








where 8 








166 Testing for serial correlation in least squares regression. II 


The values of £ and A to be used for 5 and 1 % tests against positive serial correlation are 
as follows: 5% 1%, 
§& 1-6449 2-3263 
A 00491 00-4020 
Returning to the numerical example we find from (9) 
p = 36-1495, q = 34-0860, whence F=15:99 and z= 13848. 


Carter’s approximation gives a critical 1 % value of z of 0-278, which is less than the observed 
value, thus indicating significant serial correlation. Here the significance is so marked that 
it may be seen immediately by referring to a table of significant points of z or F around 
nN, = N, = 70 (e.g. Snedecor, 1937). 

For testing against negative serial correlation the same procedure is used except that d is 
replaced throughout by 4—d. 


5. ONE- AND TWO-WAY CLASSIFICATIONS 


In any regression analysis where the independent variables assume the same values in all 
applications it would be theoretically possible to dispense with the bounds and tabulate the 
significance points of d once and for all. This could be done, for instance, for analysis of 
variance models such as one- and two-way classifications and for polynomial regressions with 
equally spaced variate values. The calculation of tables of this type is rather a formidable 
task and only one set has so far been published, namely, the significance points of the circular 
serial correlation coefficient of the least squares residuals from Fourier regressions, tabulated 
by R. L. & T. W. Anderson (1950). Pending the publication of further tables the bounds 
test of the present paper may be used, with the approximate procedure described in § 4 for 
cases in which the bounds do not give a decisive result, or when the n and k’ are beyond the 
range of Tables 4, 5 and 6. In the present section the calculation of d is described for one- 
and two-way classifications and expressions are given for its mean and variance, while in the 
next section the same is done for polynomial regressions. 

It is convenient to think of the observations as a time series consisting of monthly obser- 
vations recorded for a number of years, though the results are of course of general application. 
We may fit constants for years only or for months only or for both months and years. An 
exact test of serial correlation in the ‘months only’ case may be made by means of R. L. & 
T. W. Anderson’s tables (1950). For the ‘years only’ and the ‘months and years’ case no exact 
test is at present available. The tests described in this paper may, however, be used in all 
three cases. 


If there are s ‘years’ each of t ‘months’, the mean and variance of d for the three models 
are as follows: 


‘Years’ only: EH(d) = 2(1 +7-3): 
¢ t st 
(10) 





vard = $ - [st— 20-4 4B -5-Sta. 

a(t— 1) (st —s +2) t't Pe # & tl" 

‘Months’ only: E(d) = 2(1-Z), 
‘ (11) 
vard = aaa (ttt + 





: 3s 
8s? st s*t{° 








eae 





—- om tr & Cf 


are 


») 








J. DuRBIN AND G. S. WaTson 167 
‘Years and months’: 
a ; pres mm ) 
E(d) = 21+ i} 
4 
vard = > (12) 


(s— 1) (¢— 1)? (st—t—s + 3) 


t # 





a 7 12 58 2 6 
= 8-184 Bt 6~— 4 
e 8 s* 








These formulae were found by substituting the appropriate matrices into the general formulae 
given in Part I. The subsequent reductions were straightforward but extremely tedious. 
Example 2. To illustrate the test procedures for models of this kind we shall consider the 
two-way classification of the data in Table 2 on the receipts of butter (in units of 1,000,000 Ib.) 
at five markets (Boston, Chicago, San Francisco, Milwaukee and St Louis). The same data, 
for 1935, 1936 and 1937 only, were used by R. L. & T. W. Anderson (1950) for illustrating 
their procedure for testing serial correlation in the ‘months only’ case. The figures in paren- 
theses are residuals from the monthly averages. 


Table 2. Receipts of butter (millions of lb. weight) at five U.S.A. markets 






































Year 
Month Average 
1933 1934 1935 1936 1937 
Jan. 58-3 (+8-20) | 52-6 (+2-50) | 48-9 (—1-20) | 48-3 (—1-80) | 42-4(—7-70) | 50-10 
Feb. 51-3 (+5-28) | 46-9 (+0-88) | 43-4 (— 2-62) | 47-1 (+1-08) | 41-4 (—4-62) | 46-02 
March 58-1 (+5-°86) | 57-9 (+ 5-66) | 43-8 (—8-44) | 52-4 (4-0-16) | 49:0 (—3-24) | 52-24 
April 55-1 (+1-86) | 54-2 (+0-96) | 50-8 (—2-44) | 55-3 (+2-06) | 50-8 (—2-44) | 53-24 
May 74-6 (+5-94) | 70-6 (+1-94) | 67-6 (—1-06) | 64:7 (—3-96) | 65-8 (—2-86) | 68-66 
June 83-9 (+2-64) | 73-3 (—7-96) | 83-7 (+ 2-44) | 79-5 (—1-76) | 85-9 (+4-64) | 81-26 
July 73-5 (+1-56) | 70-3 (— 1-64) | 82-7 (+ 10-76) | 62-6 (—9-34) | 70-6 (—1-34) | 71-94 
Aug. 73-3 (+11-78) | 66-4 (+4-88) | 60-8 (—0-72) | 51-3 (— 10-22) | 55-8 (—5-72) | 61-52 
Sept. 63-0 (+7-96) | 56-7 (+ 1-66) | 55-4 (+0°36) | 51-0 (—4-04) | 49-1 (—5-94) | 55-04 
Oct. 58-3 (+5°58) | 57-2 (44-48) | 48-4 (—4-32) | 54-0 (41-28) | 45-7 (—7-02) | 52-72 
Nov. 55-1 (+9-20) | 47-7 (+ 1-80) | 37-7 (—8-20) | 45-2 (—0-70) | 43-8 (—2-10) | 45-90 
Dec. 56-5 (+9-70) | 44-9 (— 1-90) | 41-0 (—5-80) | 44-9 (—1-90) | 46-7 (—0-10) 46-80 
Average 63-42 58-22 55-35 54-69 53-92 57-12 
Total 761-0 698-7 664-2 656-3 647-0 3427-2 








Source: Agricultural Statistics, United States Government Printing Office, Washington, D.C., 1939, 


p. 390. 


We require to test for serial correlation in the model 


Yay = +A, + By + ey; 


(i=1,2,...,5 


;j = 1,2,..., 12), 


where y;; is the observation in the jth month of the ith year, ~, a, and £; are constants and 
€,; is the error term. 











168 Testing for serial correlation in least squares regression. II 


The least squares estimates of u, a; and f, are m, a;—m and 6; —m, where m is the sample 
mean of all the observations and a; and 6, are the means of observations in the ith year and 
the jth month respectively. Thus the residuals are given by 


243 = Yij —4,—6; +m. 


The test is made as before by calculating 





where the Az,,’s are the first differences of the residuals when arranged as a single time series. 

x(Az)? may be calculated by working out the individual residuals from the monthly 
averages and finding their first differences. The difference between the December value of 
one year and the January value of the succeeding year then needs to be adjusted to take 
account of the difference between the two yearly averages. For instance, the difference 
between the ‘months only’ residuals for December 1933 and January 1934 is 


* 2-50 —9-70 = — 7-20. 


From this must be subtracted the difference between the 1934 and 1933 yearly averages, 
i.e. 58-22 — 63-42 = — 5-20. The net difference is therefore —7-20+ 5-20 = — 2-00. The sum 
of the resulting differences squared is &(Az)?. For the calculation of Xz? the normal method 
for the residual sum of squares may be used, i.e. find the sum of squares of the original 
observations and subtract the sums of squares due to the fitted constants. Alternatively, 
with the above method of calculating &(Az)?, the sum of squares of the residuals from the 
monthly averages may be calculated directly, from which it only remains to subtract the 
sum of squares due to years. 

For the data in the table we find (Az)? = 1191-2454 and Xz? = 850-8890, giving 
d= eae = 1-4000. It remains to test the significance of this value of d by the method 


of § 4. For this purpose the formulae (12) may be evaluated with s = 5 and t = 12 to give 
E(d) = 2-1303, vard = 0-077964. 
If }d is assumed to have a Beta-distribution with parameters p, q, then values of H(d) and 
vard may be substituted in the formulae (9) to give 
p = 26-6758, gq = 23-4125. 
The observed value of d is 1-4000, so that 


, 


F= eS = 2-16 


with n, ~ 47, n,~ 53 degrees of freedom. A cursory examination of the significant points of 
F shows that the 5 % point is certainly less than 1-63, while the 1 % point is less than 2-00. 
Thus our value of F is significant at the 1 % level, and the hypothesis of serial independence 
of the errors in the above model may be considered untenable. 


6. POLYNOMIAL REGRESSIONS 


An important application of the least squares method in time-series analysis is in the fitting 
of polynomial trend lines. When the values of the time variate (or its equivalent in other 
applications) are spaced at equal intervals, the fitting is carried out most expeditiously by 

















nple 
and 











J. DURBIN AND G. S. Watson 169 


means of orthogonal polynomials. In what follows we shall assume that the £’ polynomials 
tabulated by Fisher & Yates (1948), and in a more extended form by Anderson & Houseman 
(1942), have been used. The regression model is 


Y = Bot fyb, + hebet---+Bebeté, 


where £; is the polynomial of ith degree in x, the independent variable, which we suppose 
takes the values 1, 2,...,”, £o,/,,...,8y, are constants and ¢ is the error term. We require 
to test the error term for serial correlation. 

The test procedure is a good deal less laborious in this application than in the ordinary 
regression case described in § 2. We shall illustrate it by considering the following example. 
The data were taken by Anderson & Houseman (1942) from Schultz’s demand studies (1938) 
to illustrate the routine procedure of fitting the polynomials. We shall not, therefore, give 
details of the initial calculations but refer the reader instead to Anderson & Houseman’s 
bulletin. 


Table 3. Price of sugar, 1875-1936 








Year Price Year Price Year - Price Year Price 
1875 67 1891 6 1997 6 1923 44 
1876 65 1892 3 1908 10 1924 35 
1877 73 1893 8 1909 8 1925 15 
1878 55 1894 1 1910 10 - 1926 15 
1879 48 1895 2 1911 13 1927 18 
1880 56 1896 5 1912 10 1928 15 
1881 57 1897 5 1913 3 1929 10 
1882 52 1898 10 1914 7 1930 6 
1883 45 1899 9 1915 16 1931 4 
1884 28 1900 13 1916 29 1932 0 
1885 24 1901 10 1917 37 1933 3 
1886 21 1902 5 1918 38 1934 1 
1887 20 1903 6 1919 50 1935 3 
1888 30 1904 8 1920 74 1936 7 
1889 36 1905 13 1921 22 

1890 22 1906 a 1922 19 
































Example 3. A polynomial time trend is to be fitted to 62 annual sugar prices (1875-1936) 
given in Table 3. The prices are in terms of mills (tenths of a cent) coded by subtracting 40 
mills from each price. 

Anderson & Houseman find the following values for the sums of squares and products, 
giving the regression coefficients shown: 


Ly = 1,336 by = 21-548 
X(y—y)? = 25,250 

X(é;)? = 79,422 D(E,)? = 1,270,752 
X(éj)? = 139,238,112 X(E4)? = 103,639,568,032 
X(yé;) = — 20,286 b, =—0-2554 

Lyk) = 72,775 b, = 0-05727 

L(yf,) = — 1,080,557 b; = —0-0077605 
L(yé) = —7,599,201 b,  =—0-00007332 











170 Testing for serial correlation in least squares regression. II 
To test for serial correlation we must calculate 


x(Az)* 
=z * 


Taking first the case in which terms only as far as 3 are fitted, 


z= y—by—b,F, — ba, — 583 


d= 





and 


Dz? = 25,250 —( — 0-2554) ( — 20,286) — (0-05727) (72,775) — (— 0-0077605) ( — 1,080,557) 
= 7515, (13) 


3 63 
(Az)? = E(Ay)?— 2b, DAy AE; — 2b, EAy AE; — 2b, DAYAL + DY DY b,b, DAE, AE}. (14) 
i=1j=1 


&(Ay)? is calculated directly as the sum of the squares of the first differences of the series of 
observations of y; its value here is 2590. For the remaining terms, indirect methods are much 
quicker. It may be verified that 


DAy AEs = (Yn —Y1) 46;(0), ) 
Ay Ag, = — 2AgLy — (Yn +43) 4E3(0), 


EAy Ag, = —A9 rye: + (yy — ys) AE5(0), 
1 . , (15) 

2 
Bay Ag, = — Aq S¥bi+ [F wt— 1-9 By | — (9, +4) 00), 





: 20. ., {1 10n?—13) — ,, : 
LAy Ag; = ale Lyé3 + li (3n? — 7) ie 4 aad Zy6; | + (Yn —Y1) Ag,(0), 


and that, with (i <j), 





DAE; AE; = — 26;(1) AE(0) (+5 erg (18) 


= 0 (¢+j odd). 


In these expressions it is assumed that the original time variate z takes the values 1, 2, ..., . 
£ (1) denotes the value of £5 for x = 1. Similarly, Ag3(0) denotes the value of Ag; for x = 0, 
ie. £3(1)—£5(0). y, and y, are the first and last observations in the series of values of the 
dependent variate. The A, is given for each n at the foot of the appropriate column of £; 
values in the published tables. 

The values of Ag(0) are obtained by writing down the first few terms of the series £;(z) 
for x = 1,2,..., and preparing a small table of differences. A£;(0) is then found by simple 
addition. It should be noted that the values of £/(z) should be read upwards starting at the 
bottom of the published table, and that the signs should be reversed for polynomials of odd 


degree. Example 3, with n = 62, gives the following lay-out. The values of Ag‘(0) required 
are printed in italics: 


x Gi AG ba Aga = A*Es 


x x & Ags Att; Ags 
i —- + i - 3 \. a 2 a 
4 218 4 — 1650 437 — 55 
5 —1113 











Su 


57) 
(13) 
(14) 


ps of 
uch 


(15) 











J. DuRBIN AND G. S. Watson 171 


— 


Substituting in formulae (15, 16) we have 
LDAyA£, = (7-67) (2) = — 120, 
LAy Ag, = (— 2) ($) (1336) — (7 + 67) (— 31) = 958, 


DAy Ag, = — = — 20,286) + (7 — 67) 769 = — 25,854, 


L(AE})? = —2(—61) (2) = 244, 
L(Ag;)? = —2(305)(—31) = 18,910, 
L(Ags)? = —2(—3599) (769) = 5,535,262, 
DAE’ AE, = EAE, AE, = 0, 
DAE; AE; = — 2(— 3599) (2) = 14,396. 
We now have all the quantities necessary for the calculation of &(Az)*. Substituting in 
(14) we have 


D(Az)? = 2590 — 2( — 0-2554) (120) — 2(0-05727) (958) 
— 2(— 0-0077605) (— 25,854) + 2( — 0-2554) (— 0-0077605) (14,396) 
+ (0-2554)? (244) + (0-05727)? (18,910) + (0-0077605)? (5,535,262) 


= 2486-06. 
__ 2486-06 


h =—____ = 0): I 
Thus d 7516 0:3308 


Reference to Table 6 shows that this value is significant at the 1 % level and therefore 
provides significant evidence of the existence of positive serial correlation. 

When a polynomial of the fourth degree is fitted to the data a value of d of 0-3603 is 
obtained. This still remains highly significant at the 1 % level. 


Mean and variance of d 


The quickest way of calculating the mean and variance of d for polynomial regressions is 
to use the numerical procedure described below. It is, however, possible to obtain explicit 
formulae, and these have been calculated for polynomials up to the fifth degree. Owing to 
the complexity of the resulting expressions we shall only give them for k’ = 1, 2, 3. 

Taking first the numerical procedure, let &,, ...,&, denote the column vectors of values of 
the polynomials (the usual prime is omitted in the vector form since we wish to use the same 
sign for the matrix operation of transposition). The mean and variance of d are given by 
(5) and (6) with 

P= an-1)- § SAE 











2 GE,’ 
rs SE ELAM, © EAE)? | & i-1(: AE,)? 
Vem -O5 3 hee he ce) +2 UE EE,” 


as was shown in Part I, A being the matrix defined in § 4. The quantities §;&, are the sums of 
squares of the values of £;, given at the foot of the &’ tables. The quantities €; AE; and &; A*5, 
may be found from the expressions 
; — 2£5(1) AEi(0)  (i+j even), 
5, Ag; = ’ * a 
0 {i+j odd), 
Bi AE, = 26;(1) A%E;( — 1) + 2A8;(0) Agi(1). 
The values of £/(x) and its differences may easily be found from the published tables. 











172 Testing for serial correiation in least squares regression. IT 
Applying the method to Example 3 we find 
244 18,910 5,535,262 











—-. . i pa pons 
Pm 3 79,422 1,270,752 139,238,112 
= 121-9423, 
8 1860 1,074,508 
Cas (araost 1,270,752 * erate 





+( 244 Y'+( 18,910 y+ ( 5,535,262 ) 
79,422 1,270,752 139,238,112 
. 14,396 
, DEE x 139,238, 1 3) 
= 367-9832. 





The explicit formulae for P and Q are as follows: 


ws 
P= (n—1- > pi). 
f=1 | 


6 30 84(n? + 1) 








wane Pi=Tingl) ??> (n+l)(nt+2) 73 ~ n(n+1) (n+ 2) (n+3)’ 
k’ 
and Q = 2(3n—44 > a), 
i=1 
24 360 
= 9p? —— =e - Sa 
— t= Pi pie 2 = PEN 1) (ew $2)’ 
336(n —2)(n—3 336(3n2 + 10n—7 
a5 = 2ph+—< 380" = 2-3) ) 


n(n +1)?(n+2)(n+3) n(n?—1)(n+2)(n+3)° 


Appendix on the calculation of the tables 


1. Tables 4, 5 and 6. The exact distribitions of d; and dy, whose significance points are 
required for Tables 4, 5 and 6, are not known. When transformed to the range (0, 1), their 
probability densities may, however, be represented by the series 





oe 
g(x) = B( p,q) 


where the G’s are the polynomials (Jacobi) which are orthogonal on the range (0,1) with 
aP-l( tis x)! 


B( p,q) 


1+ 3 a,6,(0)}, (17) 
1 


s= 


respect to the weight function 
& Hilbert,* 1931) 
G,(z) = 1-2 +49- 1 +8, (ptg—1+s)(ptats)s_ 
P P(p+ 1) 
(p+q—1+8)(p+qt+8)...(p+q+2—2s) 
, _ — 2, 
p(p+1)...(p+s—1) 


. These polynomials are defined by (see Courant 








+(-1) 


The coefficients a, may be determined by the method of moments; the distribution of d;, 
and d,, is then a series of Incomplete Beta Functions. 


* Note, however, the misprint: 2%(1—a)?~¢ should read x7" (1 —x)?~4, 























s are 
their 


(17) 


with 


rant 


of dz 











J. DuRBIN AND G. S. Watson 173 


The weight function was chosen to be the density of the Beta-distribution with the correct 
mean and variance. (An alternative weight function giving the right order of vanishing of 
(17) at = 0 and 2 = | was also tried but it was found to be less satisfactory.) With this 
weight function, the coefficients a, and a, in (17) were zero. Terms as far as G,(2) were used. 


Table 4. Significance points of d; and dy: 5% 











v=l kk =3 eas ge = e =5 
n 
d, dy d, dy d, dy d, dy d, dy 
15 1-08 1-36 0-95 1-54 0-82 1-75 0-69 1-97 0-56 2-21 
16 1-10 1:37 0-98 1-54 0-86 1-73 0-74 1-93 0-62 2-15 
17 1-13 1-38 1-02 1-54 0-90 1-71 0-78 1-90 0-67 2-10 
18 1-16 1-39 1-05 1-53 0-93 1-69 0-82 1-87 0-71 2-06 
19 1-18 1-40 1-08 1-53 0:97 1-68 0-86 1-85 0-75 2-02 
20 1-20 1-41 1-10 1-54 1-00 1-68 0-90 1-83 0-79 1-99 
21 1-22 1-42 1-13 1-54 1-03 1-67 0-93 1-81 0-83 1-96 
22 1-24 1-43 1-15 1-54 1-05 1-66 0-96 1-80 0-86 1-94 
23 1-26 1-44 1-17 1-54 1-08 1-66 0-99 1-79 0-90 1-92 
24 1-27 1-45 1-19 1-55 1-10 1-66 1-01 1-78 0:93 1-90 
25 1-29 1-45 1-21 1-55 1-12 1-66 1-04 1:77 0-95 1-89 
26 1-30 1-46 1-22 1-55 1-14 1-65 1-06 1-76 0-98 1-88 
27 1-32 1-47 1-24 1-56 1-16 1-65 1-08 1-76 1-01 1-86 
28 1-33 1-48 1-26 1-56 1-18 1-65 1-10 1-75 1-03 1-85 
29 1-34 1-48 1:27 1-56 1-20 1-65 1-12 1-74 1-05 1-84 
30 1-35 1-49 1-28 1-57 1-21 1-65 1-14 1-74 1-07 1-83 
31 1-36 1-50 1-30 1-57 1-23 1-65 1-16 1-74 1:09 1-83 
32 1-37 1-50 1-31 1-57 1-24 1-65 1-18 1-73 1-11 1-82 
33 1-38 1-51 1-32 1-58 1-26 1-65 1-19 1-73 1-13 1-81 
34 1-39 1-51 1-33 1-58 1-27 1-65 1-21 1:73 1-15 1-81 
35 1-40 1-52 1-34 1-58 1-28 1-65 1-22 1-73 1-16 1-80 
36 1-41 1-52 1-35 1-59 1-29 1-65 1-24 1-73 1-18 1-80 
37 1-42 1-53 1-36 1-59 1-31 1-66 1-25 1-72 1-19 1-80 
38 1-43 1-54 1-37 1-59 1-32 1-66 1-26 1-72 1-21 1-79 
39 1-43 1-54 1-38 1-60 1-33 1-66 1-27 1-72 1-22 1-79 
40 1-44 1-54 1-39 1-60 1-34 1-66 1-29 1-72 1-23 1-79 
45 1-48 1-57 1-43 1-62 1:38 1-67 1-34 1-72 1-29 1-78 
50 1-50 1-59 1-46 1-63 1-42 1-67 1-38 1-72 1-34 1:77 
55 1-53 1-60 1-49 1-64 1-45 1-68 1-41 1-72 1:38 1-77 
60 1-55 1-62 1-51 1-65 1-48 1-69 1-44 1-73 1-41 1-77 
65 1-57 1-63 1-54 1-66 1-50 1-70 1-47 1-73 1-44 1-77 
70 1-58 1-64 1-55 1-67 1-52 1-70 1-49 1-74 1-46 1-77 
75 1-60 1-65 1-57 1-68 1-54 1-71 1-51 1-74 1-49 1-77 
80 1-61 1-66 1-59 1-69 1-56 1-72 1-53 1-74 1-51 1-77 
85 1-62 1-67 1-60 1-70 1-57 1-72 1-55 1-75 1-52 1-77 
90 1-63 1-68 1-61 1-70 1-59 1-73 1-57 1-75 1-54 1-78 
95 1-64 1-69 1-62 1-71 1-60 1-73 1-58 1-75 1-56 1-78 
100 1-65 1-69 1-63 1-72 1-61 1-74 1-59 1-76 1-57 1-78 









































A first set of significance points was obtained using the weight function as a first approxi- 
mation; these values were then adjusted using the higher terms of the series. The first set 
of values was calculated partly by Wise’s* (1950) method and partly by means of Carter’s 
(1947) approximation. The adjustments necessary were found to be small and to vary very 
slowly with p and q; they could therefore be calculated by the following method which 
reduces to a minimum interpolation in the 7'ables of the Incomplete Beta Function (1948). 


* We are indebted to Mr Wise for some helpful correspondence on his method. 











174 Testing for serial correlation in least squares regression. II 


First, p and g may be replaced by the integers nearest them. For these integers an exact 
significance point may be found by quadratic inverse interpolation. The difference of this 
exact point and the first approximation to it is the required adjusiment. The adjustment 
required for the fourth moment turned out to be negligible to the order of accuracy aimed at 
and could have been ignored. The adjustments were so small and regular that it was only 


Table 5. Significance points of d, and dy: 2-5 % 











=i k’ =3 k’=3 k= ¢ bk’ =6 
n 
d, dy d, dy d, dy d, dy d, dy 
15 0-95 1-23 0-83 1-40 0-71 1-61 0-59 1-84 0-48 2-09 
16 0-98 1-24 0-86 1-40 0-75 1-59 0-64 1-80 0-53 2-03 
17 1-01 1-25 0-90 1-40 0-79 1-58 0-68 1-77 0-57 1-98 
18 1-03 1-26 0-93 1-40 0-82 1-56 0-72 1-74 0-62 1-93 
19 1-06 1-28 0-96 1-41 0-86 1-55 0:76 1-72 0-66 1-90 
20 1-08 1-28 0-99 1-41 0-89 1-55 0-79 1-70 0-70 1-87 
21 1-10 1-30 1-01 1-41 0-92 1-54 0-83 1-69 0-73 1-84 
22 1-12 1-31 1-04 1-42 0-95 1-54 0-86 1-68 0-77 1-82 
23 1-14 1-32 1-06 1-42 0-97 1-54 0-89 1-67 0-80 1-80 
24 1-16 1-33 1-08 1-43 1-00 1-54 0-91 1-66 0-83 1-79 
25 1-18 1-34 1-10 1-43 1-02 1-54 0-94 1-65 0-86 1-77 
26 1:19 1-35 1-12 1-44 1-04 1-54 0-96 1-65 0-88 1-76 
27 1-21 1-36 1-13 1-44 1-06 1-54 0-99 1-64 0-91 1-75 
28 1-22 1-37 1-15 1-45 1-08 1-54 1-01 1-64 0-93 1-74 
29 1-24 1-38 1-17 1-45 1-10 1-54 1-03 1-63 0-96 1-73 
30 1-25 1-38 1-18 1-46 1-12 1-54 1-05 1-63 0-98 1-73 
31 1-26 1-39 1-20 1-47 1-13 1-55 1-07 1-63 1-00 1-72 
32 1-27 1-40 1-21 1-47 1-15 1-55 1-08 1-63 1-02 1-71 
33 1-28 1-41 1-22 1-48 1-16 1-55 1-10 1-63 1-04 1:71 
34 1-29 1-4] 1-24 1-48 1-17 1-55 1-12 1-63 1-06 1-70 
35 1-30 1-42 1-25 1-48 1-19 1-55 1-13 1-63 1-07 1-70 
36 1-31 1-43 1-26 1-49 1-20 1-56 1-15 1-63 1-09 1-70 
37 1-32 1-43 1-27 1-49 1-21 1-56 1-16 1-62 1-10 1-70 
38 1-33 1-44 1-28 1-50 1-23 1-56 1-17 1-62 1-12 1-70 
39 1-34 1-44 1-29 1-50 1-24 1-56 1-19 1-63 1-13 1-69 
40 1-35 1-45 1-30 1-51 1-25 1-57 1-20 1-63 1-15 1-69 
45 1-39 1-48 1-34 1-53 1-30 1-58 1-25 1-63 1-21 1-69 
50 1-42 1-50 1-38 1-54 1-34 1-59 1-30 1-64 1-26 1-69 
55 1-45 1-52 1-41 1-56 1-37 1-60 1-33 1-64 1-30 1-69 
60 1-47 1-54 1-44 1-57 1-40 1-61 1-37 1-65 1-33 1-69 
65 1-49 1-55 1-46 1-59 1-43 1-62 1-40 1-66 1-36 1-69 
70 1-51 1-57 1-48 1-60 1-45 1-63 1-42 1-66 1-39 1-70 
75 1-53 1-58 1-50 1-61 1-47 1-64 1-45 1-67 1-42 1-70 
80 1-54 1-59 1-52 1-62 1-49 1-65 1-47 1-67 1-44 1-70 
85 1-56 1-60 1-53 1-63 1-51 1-65 1-49 1-68 1-46 1-71 
90 1-57 1-61 1-55 1-64 1-53 1-66 1-50 1-69 1-48 1-71 
95 1-58 1-62 1-56 1-65 1-54 1-67 1-52 1-69 1-50 1-71 
100 1-59 1-63 1-57 1-65 1-55 1-67 1-53 1-70 1-51 1-72 









































necessary to calculate them at 39 places in the entire set of tables, the remainder being 
obtained by linear interpolation. The adjustments were negligible for numbers of obser- 
vations greater than 40. 

As a partial check on the calculating procedure it was applied to the calculation of the 
significance points for a related distribution for which exact significance points were avail- 
able. To make the circumstances as unfavourable as possible the case n = 16, k’ =5 at the 




















a ce a: me ere eee er ae 














J. DuRBIN AND G. S. Watson 175 
Table 6. Significance points of d;, and dy: 1% 











k’=1 k’=2 k’=3 k’=4 k’=5 
15 0-81 1-07 0-70 1-25 0-59 1-46 0-49 1-70 0-39 1-96 
16 0-84 1-09 0-74 1-25 0-63 1-44 0-53 1-66 0-44 1-90 
17 0-87 1-10 0-77 1-25 0-67 1-43 0-57 1-63 0-48 1-85 
18 0-90 1-12 0-80 1-26 0-71 1-42 0-61 1-60 0-52 1-80 
19 0-93 1:13 .| 0-83 1-26 0-74 1-41 0-65 1-58 0-56 1-77 
20 0-95 1-15 0-86 1-27 0-77 1-41 0-68 1-57 0-60 1-74 
21 0-97 1-16 0-89 1-27 0-80 1-41 0-72 1-55 0-63 1-71 
22 1-00 1-17 0-91 1-28 0-83 1-40 0-75 1-54 0-66 1-69 
23 1-02 1-19 0-94 1-29 0-86 1-40 0-77 1-53 0-70 1-67 
24 1-04 1-20 0-96 1-30 0-88 1-41 0-80 1-53 0-72 1-66 
25 1-05 1-21 0-98 1-30 0-90 1-41 0-83 1-52 0-75 1-65 
26 1-07 1-22 1-00 1-31 0-93 1-41 0-85 1-52 0-78 1-64 
27 1-09 1-23 1-02 1-32 0-95 1-41 0-88 1-51 0-81 1-63 
28 1-10 1-24 1-04 1-32 0-97 1-41 0-90 1-51 0-83 1-62 
29 1-12 1-25 1-05 1-33 0-99 1-42 0-92 1-51 0-85 1-61 
30 1-13 1-26 1-07 1-34 1-01 1-42 0-94 1-51 0-88 1-61 
31 1-15 1-27 1-08 1-34 1-02 1-42 0-96 1-51 0-90 1-60 
32 1-16 1-28 1-10 1-35 1-04 1-43 0-98 1-51 0-92 1-60 
33 1-17 1-29 1-11 1-36 1-05 1-43 1-00 1-51 0-94 1-59 
34 1-18 1-30 1-13 1-36 1-07 1-43 1-01 1-51 0-95 1-59 
35 1-19 1-31 1-14 1-37 1-08 1-44 1-03 1-51 0-97 1-59 
36 1-21 1-32 1-15 1-38 1-10 1-44 1-04 1-51 0-99 1-59 
37 1-22 1-32 1-16 1-38 1-ll 1-45 1-06 1-51 1-00 1:59 
38 1-23 1-33 1-18 i-39 1-12 1-45 1-07 1-52 1-02 1-58 
39 1-24 1-34 1-19 1-39 1-14 1-45 1-09 1-52 1-03 1-58 
40 1-25 1-34 1-20 1-40 1-15 1-46 1-10 1-52 1-05 1-58 
45 1-29 1-38 1-24 1-42 1-20 1-48 1-16 1-53 1-11 1-58 
50 1-32 1-40 1-28 1-45 1-24 1-49 1-20 1-54 1-18 1-59 
55 1-36 1-43 1-32 1-47 1-28 1-51 1-25 1-55 1-21 1-59 
60 1-38 1-45 1-35 1-48 1-32 1-52 1-28 1-56 1-25 1-60 
65 1-4] 1-47 1-38 1-50 1-35 1-53 1-31 1-57 1-28 1-61 
70 1-43 1-49 1-40 1-52 1-37 1-55 1-34 1-58 1-31 1-61 
75 1-45 1-50 1-42 1-53 1-39 1-56 1-37 1-59 1-34 1-62 
80 1-47 1-52 1-44 1-54 1-42 1-57 1-39 1-60 1-36 1-62 
85 1-48 1-53 1-46 1-55 1-43 1-58 1-41 1-60 1-39 1-63 
90 1-50 1-54 1-47 || 1-56 1-45 1-59 1-43 1-61 1-41 1-64 
95 1-51 1-55 1-49 1-57 1-47 1-60 1-45 1-62 1-42 1-64 
100 1-52 1-56 1-50 1-58 1-48 1-60 1-46 1-63 1-44 1-65 









































extreme of the tabulated values was examined. The distributions of d, and d,, for these 
values were modified to give latent roots occurring in equal pairs, i.e. five pairs in all. The 
new roots were chosen to lie midway between the roots of the original distribution, thus 
preserving its asymmetry and general character. By pairing the roots in this way the exact 
significance points could be determined using results given by R. L. Anderson (1942). The 
significance points obtained by the approximate procedure agreed with these exact signi- 
ficance points to the order of accuracy required here. 


7. AN EXACT BOUNDS TEST 


We have given elsewhere (Watson & Durbin, 1951) a general method of constructing exact 
tests of serial independence which do not require the use of circular definitions of the serial 
correlation coefficient. The method can be used to obtain the exact distributions of bounding 











176 Testing for serial correlation in least squares regression. II 


statistics similar to d,; and dy. The advantage of having exact distributions is obtained at 
the cost of throwing away a certain amount of relevant information. 

For testing the independence of the errors in a regression model a statistic d’ is defined 
which is a slight modification of d. If the number of observations is even, 2m say, then 





ye (24 — 24)? +... + (2p—1 — 2mm)? + (Z4-1 — Zm4a)® + +++ + (Zem—1 — Zam)" 
2m id 
DE 
1 
and if it is odd, 2m + 1 say, then 
” (2, — 2g)? +... + (Zn—1 — 2m)* + (Zn42 — 2mts)® + «+» + (Zam — 2Zam+1)* 


2m+1 ® 


x 2 


1 





where the z’s are the least-squares residuals. The only difference from d is that one or two of 
the squared differences are omitted from the numerator of d’. Thus when m is small a sub- 
stantial fraction of the relevant information is sacrificed. 

The theory developed in Part I can be applied to show that d’ lies between two values 
d;, and dy. In contrast to d, exact significance points can be calculated for dj, and dy using 
the results of R. L. Anderson (1942), except for the case of an even number of observations 
and an odd number of independent variables, for which the exact distribution of dj, has not 
been found. A short table of such values for odd numbers of observations is given in Table 7. 
This table may be used for testing the significance of an observed value of d’ in exactly the 
same way as Table 4 is used for testing the significance of an observed value of d. 


Table 7. Significance points of di, and dy: 5 % 





P= 1 y= bk’ =3 k’=4 k’=65 








15 0-80 1-04 0-67 1-20 0-55 1-36 _— 

17 0-89 1-11 0-77 1-24 0-65 1-38 0-54 1-57 0-44 1-74 
19 0-96 1-16 0-85 1-27 0-75 1-40 0-64 1-56 0-54 1-71 
21 1-02 1-20 0-92 1-30 0-82 1-42 0-72 1-56 0-63 1-69 
23 1-07 1-24 0-98 1-33 0-89 1-44 0-80 1-56 0-71 1-68 









































The calculations required even for such a short table were very heavy, and our chief motive 
for including it is the satisfaction of demonstrating that the problem has an exact solution. 
In practice we ourselves would prefer to use Table 4 owing to the greater power and simplicity 
of the statistic d. 


We wish to express our most grateful thanks to Mrs E. G. Chambers and the computing 
staff of the Department of Applied Economics, Cambridge, for carying out most of the 
calculations required for Tables 4-7; also to Miss J. Graham of the Division of Research 
Techniques, London School of Economics, for assisting with Table 7. 











ae! ee ee ee ee te el > 


sae 




















J. DURBIN AND G. S. Watson 177 


REFERENCES 


ANDERSON, R. L. (1942). Ann. Math. Statist. 13, 1. 

ANDERSON, R. L. & ANDERSON, T. W. (1950). Ann. Math. Statist. 21, 59. 

ANDERSON, R. L. & Houseman, E. E. (1942). Tables of orthogonal polynomial values extended to 
N=104, Res. Bull. Iowa St. Coll. no. 297. 

CarTER, A. H. (1947). Biometrika, 34, 352. 

Cocuran, W. G. (1938). J.R. Statist. Soc., Suppl., 5, 171. 

Courant, R. & HinBert, D. (1931). Methoden der Mathematischen Physik. Berlin: Julius Springer. 

Dursin, J. & Watson, G. S. (1950). Biometrika, 37, 409. 

FisHER, R. A. & Yates, F. (1948). Statistical Tables. Edinburgh: Oliver and Boyd. 

Pearson, K. (1948). Tables of the Incomplete Beta Function. Cambridge University Press. 

Prest, A. R. (1949). Rev. Econ. Statist. 31, 33. 

ScHuLtTz, HENRy (1938). Theory and Measurement of Di d, pp. 674-7. University of Chicago Press. 

SNEDECOR, G. W. (1937). Statistical Methods. Collegiate Press. 

THOMPSON, CATHERINE (1941). Biometrika, 32, 151. 

Warson, G. 8. & Dursin, J. (1951). Exact tests of serial correlation using non-circular statistics. 
(To be published). 

Wise, M. E. (1950). Biometrika, 37, 208. 





Corrections to PartI. (Durbin & Watson, 1950) 


We are grateful to Prof. T. W. Anderson for pointing out an error in the section headed ‘ Inequalities 
ON V4, Vgy ++) Vn_x’» Part (b) of the lemma and its corollary have been correctly applied in the remaining 
parts of the paper, but are incorrectly stated. 

The necessary corrections are as follows: 
p. 415. The second paragraph beginning ‘We therefore seek...’ should read: 


‘We therefore seek inequalities on V,, V2, ..., Vn_»- For the sake of generality we suppose that certain 
of the regression vectors, say s of them, coincide with latent vectors of A (or are linear combinations of 
them). From the results of the previous section it follows that the problem is reduced to the con- 
sideration of k—s arbitrary regression vectors, while A may be supposed to have s zero roots together 
with the roots of A not associated with the s latent vectors mentioned above. These roots may be 
renumbered so that <a 

oS eee + 


We proceed to show that Avs <A, (6= 1, 2, ....0—k).’ (3) 
p. 415. The next to the last sentence should read: 
‘Allowing for cases in which regression vectors coincide with latent vectors of A we have (3).’ 


p. 416. Lemma (bd) should read: 


‘If s of the columns of X are linear combinations of s of the latent vectors of A, and if the roots of 
A associated with the remaining n—e latent vectors of A are renumbered so that 


Ay SAQE ... KAg_ys 


then Ai< VeSAjie-s (i — 1, 2, wees n—k).’ 
p- 416. Corollary should read: ‘r,<r<ry, 
n—k 

2 A,@ 

where ‘= ae 
=a 
i=1 
n—k 
x Aina 

and To= = y 

xo 

t=1 


Biometrika 38 12 








178 Testing for serial correlation in least squares regression. II 


p. 416: line 16. ‘A,,,,,’ should read ‘A,_,,,’. ‘ 


The following misprints should also be noted. 


p. 414, line 13. The equation should read: 
‘| I,6-—(1,-1,}) A | = 0. 


p. 414, line 22. The equation should read: 
n n n 
* TI (@—A,)+ x BA, I (@—A,) = 0.’ 
j=1 i=1 j+i 





p- 417, last line but two. The formula should read: 
#(n—k-1) 
a= [1 (17,—7,) y(t;-7).’ 
j*i 























[ 179 ] 


BI-VARIATE k-STATISTICS AND CUMULANTS OF THEIR J OINT 
SAMPLING DISTRIBUTION 


By M. B. COOK, University College, London 


1. The purpose of this paper is to derive 

(a) formulae giving bivariate population moment-coefficients in terms of cumulants, and 
cumulants in terms of moment-coefficients up to the sixth order; 

(b) formulae for the cumulants of the joint distribution of the simpler bivariate k-statistics. 

In both cases these formulae are extensions, to two variates, of univariate formulae which 
have been fairly completely worked out in the past. But while the bivariate problem has 
been touched on in both aspects by various writers, only the simplest results have been 
published by way of illustration, and a need has been felt for a more extended list of formulae, 
partly on account of their intrinsic interest and partly to pave the way for further work in 
the direction of approximations to sampling distributions, along lines which have proved 
fruitful using combinations of univariate k-statistics. 


2. Bivariate k-statistics are sample estimates of bivariate cumulants, and are expressible 


in terms of sample estimates of bivariate moment coefficients. It is desirable, therefore, to 
begin by giving the relations which exist between bivariate population moment-coefficients 
and cumulants. More than one method can be used to derive these. 

(a) Using the generating functions we have 


/440 Por 
a tT? oll! 





to+.. +m. = exp| ith + ahs +e a+. |. (1) 


We may work out the coefficient of ¢? t{/(p!q!) on the right side and equate it to the corre- 
sponding coefficient on the left, thus providing a formula for ~;, in terms of a series of 
cumulants. For example, the coefficient of ¢,¢3 on the right of (1) is 


1 ( Kis 1 ( Kop 2Kio , Ky 2kKor 1 Kyo Kor Kor 
wen) +a Or2t To! 141i ofl) TBI To! O11! OLLI)” 


a D 2 
so that Hyg = Kyo + KogK 39 + 2K Koy + 31 K10- 











Inversely, we can take the natural logarithm of both sides of (1) and work out the coefficient 
of t? t$/(p!q!) on the left side, equating it to the corresponding coefficient on the right. This 
will give a formula for «,,, in terms of a series of moment-coefficients about the origin. 

By writing (1) as 








/20 fu oz Fog 
Tarot grays tora tt + git at 
20 Ko2 
= exp| t+ +o hite+ pet + pena. |. (2) 


in which the sum of the suffices of the y’s or «’s is at least 2, and proceeding similarly, we get 
a formula for 4,,, the moment coefficient about the mean in terms of a series of cumulants 


12-2 








180 Bi-variate k-statistics and cumulants of their joint sampling distribution 


in which neither x1) nor Ky, appears. The inverse formula is obtained by taking the natural 
logarithms of both sides. 

(6) The process in (a) soon becomes laborious, but for any given order of result we can 
proceed, as indicated by Wishart (1928), by expanding (1) and (2) into their analogues for 
3, 4 or more variates and then writing down a general result. Thus for three variates we 


should have , 
fyyy = Kina + K130K001 + K101 010 + Xo11 100 + 100010001» (3) 


which can, in fact, be derived directly by writing down all the combinations of one, two and 
three cumulants subject to the condition that the combined suffices should total 111. This 
is a simple combinatorial procedure. Suffices may then be amalgamated in (3), or extra 
noughts may be inserted. For example, adding the second and third suffices we have 

Hy = Kyg t+ 2k 11 Ko + Koa K10 + Ki0Xb1- 


Inversely we should note, for example, that 


Ky = Hi — Ai10/001 — #1014010 — Hor 4100 + 2/4100/4010/001» 
in which the term containing p moment-coefficients is multiplied by (—1)?-1(p—1)!. 
Corresponding to (2) we have an extension which, if taken to four variates, will give us 
ia = Ky + & 100X011 + K1010K 101 + 1001 Konto» 
which is, of course, the form assumed for /j,,, when terms on the right containing unit 
parts are ignored. Inversely 
Kina = 41111 — P1100/0011 ~ P1010//0101 — 1001 /40110- 
For example, amalgamating the first two and last two suffices we get 
Haq = Koo + Koo Koo + 2K4j, 
Koo = flog 20/02 — 2/1 
Higher order results can easily be checked by noting from tables of symmetric functions 
(David & Kendall, 1949) that all partitions have been included and that the sum of terms of 
a particular partition is the partition coefficient given in the table. 
(c) Kendall (1940) introduced a set of operators A,,, which, operating on the appropriate 


univariate formula, will give any desired bivariate formula. The technique, expressed in 


terms of an equivalent operator suggested by Kendall for another purpose, is best described 
by a specific example. We have 


Ms = Ky t+ 5k 4K, + 10Kg Ky + LOK gk? + L5KZK, + 1L0KK3 +43 


from published tables, or from the line (1)° in Table 1-5 of David & Kendall’s tables of 
symmetric functions. Write this formally as 


fe’ (7°) = K(1°) + SK (r4) K (77) + LOK(r3) (7?) + 10K(73) K2(r) + 15K2(r2) K(17) + 10K(r?) K3(r) + K5(r), 
and operate on it by s0/dr. Dividing by 5 we get 
fe’ (148) = K(r4s) + K(r*) K(s) + 4«(r38) K(r) + 4K(r3) K(rs) + 6K(r2s) K(r?) + 4«(73) K(r) K(s) 


+ 6x(1*s) K?(1r) + 3«?(r?) K(8) + 12«(r?) K(r8) K(r) + 6K(r?) K2(r) K(8) + 4x«(r8) K5(r) 
+K4(r) «(s). 














nit 


ons 
3 of 


ate 
in 
ed 


of 


, 


*(r) 














M. B. Cook 181 
Using the indices of r and s as suffices this gives 


Har = Kay + KyoKo; + 431 Ky9 + 4K g9K 11 + GK 21 Kyq + 4K gg 9X91 + Ky) Kg + 3KZQKQy 
+ 12k 9K 11K 19 + BK 29K GoKo1 + 4K 11 Kio + Kio Ko1- 
#2 may similarly be obtained by operating on y’(r4s) by 80/dr. When we deal with moment- 
coefficients about the mean the operation is much simpler. We need only apply s0/dr to 
f(r?) = K(r°) + 10K(79) K(7?), 
got from (75) by deleting partitions having unit parts, to get 
Hay = Kay + Ak 39K 11 + 6K yy Kyo. 
In both cases the inverse formulae can be written down at once. 

Method (c) was in fact the one used. We give below all the formulae up to p+q = 6, 
tabulating #;,, and x,, for p>q only (interchange of suffices will give ~;, from y},,), and 
incorporating for completeness the univariate formulae already published. 

#10 = Kios 
H39 = Ka9 + Kio, 
Hy = Ky t+KioKo 
H39 = Kq + 3K 29K 19 + Kio» 
Ha, = Kay + Kao Kor + 2K 11 K10 + KoKows 
Hag = Kyg + 4K 39K 19 + 3K 39 + 6K 9 Kjo + Kio, 
M3, = Ky + KgoKox + 3K q1 Ky9 + 3K 9K 11 + 3K 9K y9Ko1 + 3K 11 Kj + KoKors 
Hag = Kap + 2K oy Koy + Koo KG1 + Kao Koa + 2KygKy9 + 2K], + 4K Kio Kor + Ko KO + KoKo2» 
H50 = K50 + 5K 49K 19 + LOK 9K 29 + LOK gg Kg + LEK G9 Ky9 + LOK a9 Kg + Kio, 
Hay = Ky, t+ KgoKo1 + 4K 31K 19 + 4K 39K 11 + 4K 39K 10K 01 + 6K] Kao 4- 8K a1 KYy + BK Z0K or 
+ 12K29K11K 19 + 6K 29 Kg Koy + 4411 Kio + Ko Kor» 
M32 = Kgq + 2K31 Koy + KgoK G1 + K30Kog + 3K o9K 19 + 6Kp1 Kyy + BK g1 KyqKo1 + 3K 9K 1p 
+ 6K 49K 11K 01 + 3K 9K 19 Kh1 + BK 29K Koz + 3K 12K jo + KY Kyo + 8K) Kg Kor 
+ Kio Koi + Kio X02» 
M60 = Keo t+ 8k 59% 10 + 15K gg Koq + 15K yy K 39 + LOK3y + 60K 39 Koq Kyq + 20K 39 Kip + 15K, 
+ 45K 39K Yo + Lik yg Kio + Kfo; 
Hoi = Kei t Ks0Kor + 5K gi Ky9 + DK go Ky + DK gg KyoKo1 + 10K 3, Kaq + 10K 3 Kg + LOK gg Kay 
+ 10K 39 K29Ko1 + 20K gy K11 K19 + LOK yg Ty Kor + 30K] Ka9K 19 + 10K 1 Kig + 15K 59K 11 
+ 15K39K 9K o1 + 8UK 29 Ky, K 39 + LUK yg 3p Kg, + 15K, Ky + Kg Kor, 
Hag = Kqg + 2K 41 Kor + Kyo Kor + Kao Koa + 4K g2K 19 + 8Ky1 Ky + 8Kg1 Kyo Koy + 4K 30K 12 
+ 8K g9K11Ko1 + AK 39K 1X1 + 4K 39K 19 Ko2 + OK'y2K a9 + OK 9K Yq + BAD, + 12K g1 KegKor 
+ 24K Ky Kyq + 12K a Ky Kor + BKG9KG1 + BKGoK og + 12k g9K 2K 19 + 12K 9K 
+ 24K 9K 11K Kor + OK yg Ky KG1 + OK29KioK ys + 4K 2 Ko + 12KI, Ajo + 8K11 M0 Ko 
+ KioKi1 + KioKoe, 
H3g = Kyg + 3K 32K qq + BK 51 Kj + 3431 Koa + Kyo Kon + 3K g9Ko1 Koz + Ka0K 03 + 3K 23K 19 + MK p2K 11 
+ Wyo Kyo Ko1 + YK Kyo + 18K g) Ky Koy + Moi K19KG1 + MKa1 K 19K 02 + 3K204%13 
+ DK 9K 2Koy + DK 29K 11 KGy + WK 29K 11 Koa + BK 29K 10KG1 + MK 20K 10K 01 Koz + BK 0X 10K0s 
+ Bk ygK jg + 18K 2K 1) Kyq + IK 2K ig Ko1 + GAT, + I8KT Kyo Kor + Ki Kio KO 
+ 9K AToKo2 + KiyXG1 + BA To Kon Koz + Kv Xos- 








182 Bi-variate k-statistics and cumulants of their joint sampling distribution 


Kio = Ho» 
Keo = #29 — Hibs 
Ky = yi —/o/on 
K39 = £439 — 3/20/10 + 2/4195 
Koy = Loy — 20/01 — 2/410 + 2446 Or» 
Kao = Hao — 4/30/10 — 3/425 + 1229416 — 8435, 
Kai = 31 — M30/o1 — 3/21/40 — 8/20/11 + 6H 20/10/01 + Bia 45 — 84:5 Hor 
Kae = /lo2 — 2/9 Mor + 2/30/03 — H20M02 — 212/10 — 2f13 + 8/41 10/01 — 850i + 2/415 402» 
K5q = M50 — 5l4o/i0 — 10/439 /429 + 2030 /415 + 8025 M19 — BOM a9 35 + 24458, 
Kay = Man — Hao lor — 431/410 — 4/30/41 + 8f30/410/401 — 691 20 + 12 p93 45 + 68 Hor 
+ 24 1199/4 M49 — 36459 /445 Mor — 24/41 43 + 245601» 
K32 = M32 — 231 Mor + 2H30/01 — H30Hoz — 3flo0/19 — 6a, Mia + 1221 Moor — 320/12 
+ 1213943 Hor — 1850/40/03 + 8/20/1002 + 842/418 + 12M43Mi9 — 3643 i801 
+ 2435 Moi — 8/435 02» 
Keo = M60 — 8it50 410 — L5pgoH29 + 30/49/45 — LO M35 + 12059 f29/419 — 120p139 448 + 30M25 
— 270 p35 35 + 8360p 99/045 — 12045, 
Ky = M51 — Moor — 5/41/40 — 5g i + 10/49/40 /401 — 10433 Ho9 + 2033 48 — 1039/21 
+ 20/39/20 Hor + 40/130 411/410 — 6030 f445 Hor + 6 Of 94 20/410 — 609, 49 + 30f95 /441 
— 902055 0/01 — 180/659 /641 145 + 240M 0H 5 H01 + 1204) 446 — 120475 401, 
Kyo = Haz — 2/43 Hor + 240/07 — 40/02 — 452/410 — 8/31 faa + 1633 M0 Ho — 4/30/4412 
+ 16/039 /441 Hor — 2430/10/07 + 8/30/1002 — Sftoa/o9 + 12p99 435 — 637 
+ 24115) Mo9 for + 48/91 43 0 — 72/91/45 Mor — 18498 Hot + 6198 Hoe + 24/00/12 /410 
+ 24 Mog fy — 144 p99 fH 0/or + 14409014601 — 86420418 M02 — 24H 12/448 
— Ty H6 + 192 p41 3 Hor — 120f36 Hoi + 24/446 Ho2» 
Kgg = 35 — 33a Hor + 6/431 Moi — 3/31/02 — 630/08 + 8f430/01/402 — 30/403 — 3/423 /410 
— Mflg0/i1 + 1859/40/01 — Maa Mie + 36493 4a Mor — 5431 0/01 + 18131002 
— BY 59/43 + 18 (659M 42Mor — 54/659 /441 Hod + 1839/44 Hog + 7250/40/01 
— 54 130/001 Ho2 + 8f30/i0f0s + 6443 H15 + 36 p42 M41 M10 — 54M 0/3 Mor + L2eh 
— L087 poHor + 216 p43 45/09 — 54/41 Ha Mon — 120M 35 mot + 72443 01 Moe 
— 6135 Hos: 


When the origin is at the mean (kK) = Kp; = 0): 


f29 = Keo, 
fy = Kip 
39 = K30> 
fai = Key), 


Hao = Kay + 3K Sq, 
Hg, = Kg, + 3K 9K 1, 


a 2 
Hog = Keg + Kyo Koo + 2K, 


= 
s 
| 


= K5q + 10K 59K 29, 


~~ 
~ 
— 
= 
| 


= Kqy + 4k 39K 11 + 6K yg) Kyo, 








, 
fey, 





M. B. Coox 183 


Hag = Keg + KgoKog + 6K91 Ky1 + 3K gq K 12, 

Heo = Keo + Lk yg K a9 + 10K35 + 15K3o, 

Hs, = Key t+ Sk gg Ky, + LOK 3; Ka + LOK gg Kg) + LOKZQ Ky, 

Haz = Kaz + KgoXog + 8k 31 K11 + 4K 39K yg + 6K 29K a9 + 63, + BK ZqKo9 + LAK ag Kh, 


Hg = Kgg + 3K 31 Koo + KggKog + 9K y9 Ky, + WK qq Ky2 + 3K gqK 13 + MK 29K 11 Kog + KT, 


and 
Keo = Feo» 
Ky = Ap 
K30 = 30» 
Ko = ap 


Kqo = Hao — 3/30, 

Ky, = 431 — 3 20h» 

Koo = Hoe — Heofo2 — 2G, 

K50 = H50 — 1039/20; 

Kay = Har — 430/411 — 521 20> 

K32 = 32 — Hs0/to2 — 821 Mir — 320/12 

Keo = /69 — 15 ptgo 29 — 10M5 + 30p13o, 

K51 = M51 — Sftgo/rr — 10/51 29 — LO faq fa, + 303911, 

Kqo = faa — Haolo2 — 8fai/11 — 4430/12 — 8/22/29 — 63, + 639 Moe + 2429/41, 

K33 = /433 — 3/31/02 — 30/403 — 9/22/11 — 9f21 12 — 32013 + 18 feo fron + 12h. 
3. The univariate k-statistic k, introduced by Fisher (1929) is a function of the sample 

observations 2, ...,x, such that &(k,) = «,. 
It may therefore be written as 
oa gt Me rt ia 1989: 


~~ (ry!)* (rg!)*2 ... 74! ... 9! n'?) 





in which the typical term corresponds to the partition [rj:rj:...] of the integer r, with 
¥ (r;7,;) = rand > 7, = pand the summation is over all partitions. [rJ:rJ: ...]is the augmented 
7 7 

symmetric function of x, %g, ..., 2, (David & Kendall, 1949) and n™ is the factorial expression 

n(n —1)...(n—p+1). To see that k, has this form we need only note that the first part of the 

term under the summation sign is the coefficient of 4,” 4," ... in the expansion of x, in terms 


of moment coefficients about the origin, while 

E [ri rye...) = nO rt... 
The tables of symmetric functions at once give k, in terms of the power sums s,, and they can 
be abbreviated by being expressed in terms of the sample estimates m, of the moment 
coefficients about the mean (with the single exception that k, = mj, the sample mean). 

In analogous fashion, if our sample consists of n pairs of observations (2%,, y;), (%,, Y2), -.-; 
(2n;Yn) from a bivariate population, we define a k-statistic k,, such that &(k,,) = Kp. It 
may then be written as : 

~ ey Le Pei pla! (ert) (Pog2)"* ---] 
- (4! Qy!)72 7! (pe! do!)"2 719! ... no) 














184 Bi-variate k-statistics and cumulants of their joint sampling. distribution 


in which the unipartite number r is replaced by the bipartite number (pq) and the typical 
term corresponds to its partition {(»,q,)"' (P2q2)” -..}. Values for k,,, ka, ks, and kg. have 
been given by Fisher (1929) in terms of the power sums 8,, (kg9, kz, etc., are univariate 
results). The tables we shall give only involve k,, and k,,, and these are very simply expressed 
as 


n 
ky = 2 
pee: ewe 
et (n— 1) (m—2) 


where m,, is the sample moment coefficient, about the mean (%,¥) corresponding to 
, 
t = FT Pin wi Cores =F 
Mig = = xP yf/n with mig = Z, mo, = ¥- 
ix 


4. Our problem now resolves itself into the determination of the cumulants of the joint 
distribution of these bivariate k-statistics. To give a simple example, let k,,, and kg be two 
such statistics, and suppose we require the x,, of their joint distribution. To distinguish this 
from the population «,, we shall write it «[(aa’)"(£’)*] with corresponding notations for 
the ,, and “1; of the joint distribution. The more usual notation, which we preserve in the 
tables at the end of this paper, would be H% Bi ie ; f At but for simplicity in writing and 


setting the present notation is better. The first step in the algebraic derivation .s always to 
calculate irene! hae 
H’[(ax') (BBY) = 6 (kearkippr), 


which will be given in terms of population bivariate moment-coefficients about the mean if 
partitions involving unit parts are ignored. These can be converted into population cumu- 
lants. We then invoke the formulae of §2 which, in the present notation, will give 
h'[(aa’¥ (BP")§] in terms of «[(aa’)’ (f/"')*] and other terms which are combinations of lower 
order «’s, which will have been calculated already. By removing all surplus terms from 
L'[(aa’)’ (BP')*] we are finally left with «[(a«’)’ (f’)*]. Up to the third order the result is 
identical with that for 


fel (aae’) (BBY) = EX (haa — Kaa) (pp —Kpp)}, (7 +8<3). 

Purely algebraic derivation of even the comparatively simple results given at the end of 
this paper would be a formidable task, and we seek other methods. One which naturally 
commends itself is Fisher's combinatorial technique (Fisher, 1929) which is applied through- 
out as in the case of the univariate k-statistics, but distinguishing two kinds of objects in the 
various partitions. The n-cvefficients, which depend only on the patterns, remain the same, 
and we have the advantage of a check in the sense that a single term of a univariate k-statistic 
result has its numerical coefticient already known, and we can be reasonably sure of our 
accuracy if the coefficients of the subpartitions add up to this total. To illustrate, let us find 
the coefficient of kg. k%, in «[(30) (21)?]. The corresponding univariate result is the coefficient 
of kk} in «(3°). For this the partitions are 








oe 5 2211/5 

(a) 0 1 4 2 (6) 101] 2 
011 2 6.4242 
s $s “eS 











So 


- ae & 


on 


typical 
(on have 
ivariate 
‘pressed 


ling to 


1e joint 
be two 
ish this 
ons for 
> in the 


ing and 


vays to 


nean if 
cumu- 
ll give 
f lower 
s from 
sult is 


end of 
turally 
rough- 
in the 
same, 
atistic 
of our 
1s find 
ficient 








M. B. Coox 185 


These contribute 54/{(m — 1) (n— 2)} and 162/(n — 1)? respectively. To deal with the bivariate 
problem we must alter the column entries so that in two columns the three 1’s are replaced 
by two 1’s and an X. We then find that (a) evolves into six partitions, two of which have row 
totals (32), (20), (20). They are 








“Sat (32) a a (32) 
“Re PE (20) “ee ee (20) 
ates See (20) Bae Be (20) 
(30) (21) (21) | (21) (30) (21) | 


The numerical coefficients, i.e. the numbers of ways of assigning the objects subject to the 
row conditions, are 2 and 12 respectively. This gives a contribution 14/{(m—1)(n—2)} to 
the desired result. On the other hand, (b) evolves into nine partitions, of which two concern 








me, namely, 6 ax) X (32) (ix) (IX) 1 (32) 
‘eee Abe (20) ‘lek ide (20) 
ee el (20) “es ean (20) 
(30) (21) (21) | (21) (21) (30) | 


the numerical coefficients are both 24, thus providing a contribution 48/(n—1)*. Finally, 
the coefficient of ks. k3, in «[(30) (21)?] is 
14 ,_ 48 _ _2(31n~55) 
(n—1)(n—2) © (n—1)? (n—1)?(n—-2)° 
The other partitions provide the coefficients of ko kao koq, ksqk?, and ky kogk,,. The reader 
who cares to complete the process will verify that (a) yields 





© 
~- 


ee os 2 . 2 
(2k 59K 29X02 + 2k 59K Gy + 16K 41 KaqK11 + TKggK $0), 


(n—1) (n—2) 
while (6) similarly yields 
6 


(n—1)2 (2k 59K 292 + 3K 50K91 + 14K 41 Ko9K11 + 8K g2 Ko). 


Adding term by term gives us four terms of x[(30) (21)?]. 


5. In the particular case of the joint sampling distribution of ky, kg, and k,, in samples 
from anormal bivariate population, Wishart (1928) showed how it was possible to generalize 
the results. He also generalized the procedure to deal with as many variates (beyond two) 
as might be required, and gave the completely general cumulants, to the fourth order, of 
the simultaneous distribution of the second-order moment statistics, on the assumption of 
normality of the parent multivariate population. All special cases can be derived from these 
general results by coalescing the variates in all possible ways. 

6. The method we have followed in deriving the results listed at the end of this paper, 
which are for samples from a general (infinite) population, is the operational method due to 
Kendall (1940), essentially an extension of the method already used in § 2. We shall illustrate 
by working out all the terms of «[(30) (21)?] which correspond to that of &,43 in x[3*]. Write 
k[39] =x[(30)?| formally as «[r?r373] and operate on it by 


: a2 : a2 a a2 
S = 8, 8=—~— + 8383 =—>— + 838, 5—>— 
12 Or, Ot, * °Or,0r, * *Orgor,” 


obtaining Ok[r?s, 738, 73) + 9K[r273 8.7384] + O[r7.s8, 737385]. 











186 Bi-variate k:-statistics and cumulants of their joint sampling distribution 


Reteining the indices only of the r’s and s’s this becomes 27x[(30) (21)?], since the order in 
which the constituents ko, k., and k,, are written is immaterial. We have therefore to operate 
with S/27 on expressions which correspond with the partitions (a) and (b) of §4. The first, 
taking account of row composition, is written as «(r?r,7,) *(r2r3) and operating with S/27 
yields 


z 71K (178283) K°(1rg 1g) + 2K(17891g) K(1'gg) K(1'283) + 2K(1} 273) K(7273) K(8283) 
+ 2k(r7}17g7g) K(1"283) K(83) + 2«(r3 28) K(1273) K(8q73) + 3K(138; 8973) K*(7273) 
+ 6K(19 8, 1213) K(12%3) K(8q73) + 3K(12.8, 17285) K2(r273) + 6K(72.8, 7973) K(7273) K(7'253)]- 
Using indices only of the r’s and s’s, and collecting terms, this reduces to 
271 2K 50K 20 Koz + 2k 5qK91 + 16K 41 KaqK 1 + 7K 39K 30]- 


Noting that the coefficient of partition (a) of § 4 is 54/{(n — 1) (n — 2)} we see that this partition 
contributes the term 
2 


(n—1) (n—2) [2k 59K 20 Koa + 2k 50K h1 + 16X43 KaqK11 + TKg2K Go] 


to the desired result. This agrees with the result partially worked out in §4 by the com- 
binatorial method. 

The expression corresponding to partition (b) of §4, which has coefficient 162/(n—1)?, is 
written «(r$7r313) K(7113) X(7273), and operating similarly with S/27 yields finally the terms 


6 
Gn — 12 Ko 0X02 + BKK}, + 14k 41 KoqK11 + 8K 32K 30). 
Adding the two expressions term by term we have the final result that the terms of 
k[(30) (21)?] corresponding to that of «,«? in x[3°] are 


(n—1)?(n— 7 A4n— 7) Ks9K29Ko2 + (112 — 20) Ks9 Kk, 


+ 2(29n — 50) Kq Kog Ky, + (310 — 55) Kyo K3o}. 


By using this operator S/27 on all the partitions which contribute to the terms of k[3*] 
we build up a complete expression for x[(30)(21)?]. For every such third-order cumulant, 
based on «[3*], of the joint distribution, there is an appropriate operator. For example, 
k[(21)?(12)] may be obtained by using the operator 


1 
a 
162 "92°33, O78 Or, 


Similarly, cumulants of all orders of the joint distribution may be worked out by choosing 
the operator appropriately. 


7. The procedure indicated in §6 is somewhat laborious, but is straightforward and 
therefore avoids errors of omission. It is not, however, necessary to obtain every single 
result by a separate operation. Consider, for example, the univariate result 





K 12k,k 4(n — 2) 8 
93) — -& 42 2 4 
an n?* n(n—1) n(n—1)2*** (@—1)2*? 





co 


F 


geet! teen Cs. Ma. cae 


, 


der in 
erate 
» first, 
1 8/27 


tition 


is of 


([ 33] 
ant, 
ple, 


sing 


and 
igle 





M. B. Cook 187 
corresponding to the partitions 














7 ies age | 6 ae 4 Pi te! 3 : 4° 2 
a s 7 4 4 2 oe ee 3 = 2 
222] 3 es ori {2 
$s. 
Formally the result may be written 
l - © 
k[rir3r3] = paklrira"d) + a) K(ri 7273) K(7273) 
4(n—2 
+ ain (717273) +i Spaerare) rar) K(T213)- 


Operation by 2s, = o = 1, 2,3) will yield «[(20)?(11)]. But this is evidently equivalent to 


the easier process : operating by s0/dr on 





PONS - pat 2) s 
Ee ~ > 2 o? +, yO 0? 
Similarly, x Singt XD ss; ar =? Be? 


so that operation by s?0?/dr? will produce «[(20)? (02)] + 2«[(20) (11)?]. If one of these results 
has been obtained by the procedure of §6 the other may be obtained by difference if the 
relatively easy operation by s?d?/dr? is carried out. In general, it is found that for any given 
order approximately half the cumulants of the joint distribution must be worked out by 
the methods of $4 or §6, while the remainder can be obtained by using the appropriate 
power of s0/dr as the operator. 

As already indicated, the method used to obtain the formulae which follow was that of 
$6, but the operator discussed in the present section has provided a useful check on the 
results. Since most of the results are new it was considered desirable to work them out by 
two different methods. 


ri, 
‘0 0} ~ nXT n—1° 


ike, oe 2 
No 1) ~ n*s n— 








1 X20%11» 
20 1 a 
‘lo 1) ~ Meet <i 
11\_1 ae 
a “a are 
222 7 . 12 me 4(n — 2) . 
( 0 0) ~ nF nig — 1) “40%20F nin — 1280 t+ 12 Sm» 
(5 01) 7 a8 +— : era ed Sec =?) kok +— Lr 
001 n2 51 n(n— -1) K31K20 n(n — 1) 40™11 n(n — 1)? 30%21 (n n—1)? 20" 11> 
dy: he 4 aaee 8 4(n — 2) 8 . 
002) 2X4 n(n—1) 22% 29 n(n—1)<2*"t nin 1 1? Kh + Gye Xeokiy 











188 Bi-variate k-statistics and cumulants of their joint sampling distribution 

















man 3. 2s cs Te 4, 2n-2) 
011 ~~ m2 42 n(n —1) Ke2Ko9 n(n — 1) Kg) Ky n(n — 1) K4oKo2 n(n — 1)? 21 
2(n — 2) 6 pee 
n(n— 1)2“s0% ti)” 20 Kh + Go ype Mok on 
sedan or Koay, +———_~ : gee Sargeras. OR a 
021 m2 33 n(n — 1) 22™11 n(n — 1) 20"13 n(n—1) 3102 n(n — 1)? 2112 
- 2 4 
Pr es ee 
raal- n(n —1) 822° t+ a1) 20X13 thin — n(n — 1) <31Xee seni} 122112 
(n—2) 6 
n(n—1)2 K30 Kost in sHpekta + Ga ape ks M202: 
d2222_ 1, , 24 | 82(n—2) (An? On +8) 
0.0.0 0) 18° * AP —1) °° n2m—1)2 0" Bm —1ys 
44 og , 960-2) 9 tate 48 
n(n — 1)? 4020 n(n — 1)8 K3o Koo — jst 
2221) 1 18 6 20(n — 2) 12(n —2) 
Lo 0.0 1) 729° pla —1y 8 pal) ON a(n — 1) + alg —1)F MOOK 
8(4n? — 9n + 6) 72 72 24(n—2) , 
n*(n — 1)8 K40%31 tan 12 XsoKnk20t i — Teka Xo + oy 1p ok 
72(n —2) 48 
n(n—1)3 KsoX20%a + wags 
2220 12 24(n —2) 8(n —2) 
(00 0 2) “ast aR a Makin aq Bayete ag apse aay ae 
4(6nt—12n+9),, 12 | - 4 : 
n*(n — 1)8 K51 n*(n — 1) 40X29 n(n — 12 X40%11 + Gy — 12 X22 X20 
48(n— 2) 48(n — 2) 48 
n(n—1 2 5 K 31 Kap util 1)8 2. K20 tT (n—1)3 KeoKarku + Ga Sto%iy 
Tse ee 13 10 1 12(n — 2) 
“Yo 01 1) > 8“? nem —1) §® Kot 3n— 1 i) “a yt nn — =) 00X02 + 73m — 1)2 X22 a0 
16(n—2) 4(n — 2) 2(7n? — 16n + 11) 
n(n — 1)2 “41X21 * 7 2(mq — 12 Xo0%12 m(n—1)8 40X22 
2(9n? — 20n + 13) 80 
= KR +} 5 Kg Koo Ki + i Kao K 20K; 
niin-1P  n(n— 1)? 8 n(n 1)? 
32 20 40(n— 2) 4(n—2) 
tan— jpeXeeto tO pews + oy — 18 Ws Koka ki thin — 1 “18 K3oKoo 
28(n—2) , 24(n — 2) 40 ee 
n(n — ys Su 20 Tin — 1)3 “ao ‘20%i2 + yp sos + 7G 7p X20%o2» 








50X21 


} 42 
K30K11 











M. B. Coox 189 
S210 3 14 s 2. 20(n — 2) 
(0 01 ) ~ B08 Bln — 1) <A * aq — 1) 8820 * pay — 1) “81 %08 Tam — 18 SOOKE 
+ Bm=2) eg Mm— 2) MImt—16n+ 1) ae 
n®(n — 1)? 4112 72(q — 1)2 “0X28 n*(n — 1)8 we va-i 
to ahh sh Kank + ee ee 
n(n — 1)? 31“11 n(n— 1)? 11*22"“20 n(n —1) 20"13 n(n — 1)? 40*11"02 
16 40(n—2) , 8(n — 2) 
+ iin — 1 Xs1 00% 00 + Fn — 19% 21 Ki tn — 18 a1 X00 Xo2 
32(n — 2) 16(n — 2) 32 3 16 . 
n(n — 18 X22 Xz ta(n—1) Keokukiat (ps 1120+ Gy — ya 11 X20K oa» 


Biv fs oe arg 4 n=2) |, 
‘Nol. ~ ms 58 n2(n — 1) “3820 n(n — 1) KqoK1 n*(n— 1) K51 Ko n(n — 1)?“ 














15(n — 2) 9(n — 2) (n—2) 5n?—12n+9 
n(n —1)2*% Kut sama 1pkakiat pag 1psXsoXs tam, — aye Kaos 
3(9n? — 20n + 13) 36 0 
m(n—1)3 K31 X22 tam pea tlm — 1211 20% 2a 
ges Kes Kani ro ae KaoK 11 Koa +— aia Ka 7-3 2) M3 K 
n(n — 1)2“81<20%oa + 2 — qe Xao%11 Kon ni 122 Kig n(n—1)3 <1 
12(n— 2) 30(n — 2) 24(n — 2) 
n(n — 1)3 “90X21 Kon + n(n — 1)8 20X21 X12 + n(n—1)8 Kyo Ki2K11 
PA a Sl a+— = ——~ Ky Koo K. 
n(n— 1) 30"03"20 (n —1)8 Keo *i1 (n —1) 2002119 
2200) 1 16 4 4 16(n— 2) 
M5 02 ») = ket aa 1 “8 Kut am —1) * 24 Keo + 7am — 1) <# Koz + 7 3(m — 1)2*s2*12 
16(n—2 8(2n? — 4 16 64 
+ aaah kn een kde nin 1) *81X18 + oq — pa Kaas 
~ ets 32 16 - , 1m=3] 3) 
n(n —1)2<2%n 13+ n(n— n(n —1)2*81X11 Koa + 7 n(n n(n — 1)2 “22X20 02 t im —1)3 Koa 
wep Kant nip kaka kat Ga Kook t Gan 
ee} Woda LE LOTE MRPON HIT. oe 
0211) n® 4 n{n-1) * ™* n{n-1) 2 4 * n\(n-1) @ @* n(n-1) * 
14(n —2) 2(n—2) 2(n — 2) 
n®(m —1)2 82X12 + pay — 12 Xan Kos + nn— n®(n —1)2 “14X20 
2(7n? — 16n + 11) (17n* — 38n + 25) 14 
m(n—1p “Xi t" “aq aps “a nay — 1) “aoXon 
+ 52 a ae 24 32 
n(n—1)2*% ut an — a(n — 1)2“22X20%oa +7 — ya 11 X20%1a + — 7) X11 Koa Ko 
pi yD eg OD 52n—2), 
n(n — 1)2<40%0a + — 12 X20 Koa n(n —1)3 “™ Koe n(n — 178 “11a X12 
+ 12(n— 2) slit 8(n — 8 6 gs et ea ont 
n(n — 1) Keo 12+ n(n — 1) 30%12"02 n(n — 1)8 300311 n(n —1)8 21"03"20 


12 32 4 
+ (n— peat (n— 1p3*t1*t0%0a + Gaps ptm 














190 Bi-variate k-statistics and cumulants of their “ sampling distribution 
( 11 )- l 6 4(n — 2) 


Ma aa a) 78 naa) oT aa ny tnt ~ wn a1) Xt2K02 + pam — 1a X14 Kao 


12(n — 2) 12(n — 2) 4(n —2) (n?—3n +3) 











n®(n — 1)2 “2821 + a(m — 12X82 X12 + n(n n®(n — 1)2 “410s n(n —1)8 K4oXoa 
3(5n?—11n+7) , . 4(4n2-,9n+6) 30 
mn—1)3 22 n{n—18 Ky Kis tim — 12 200222 
36 36 36 “ ‘ 
Fan — 1728111320 + 7 — 72X11 Koa Kas tT an— 12 X2eku ry — 220% 
goo. ee 12(n —2) 12(n-2) 
nlm — 1)2X02*40 +7 — 18 “an X12Xas +3 — Tp X11 Xs0% 0s n(n —1)8 21 Xe 
12(n —2) 12(n — 2) 12(n —2) 36g 
n(n — 1) <30%12X0e n(n—1)3 1220 n(n — 1)8 “2021 0s + 38 11K 20X02 
6 6 
+a ip MX + Gaps ll 
(2 3)\-1,. 4 9 9 —— 
0 0) ~ nt n—1 “oXmtS i Kot (n—1)(n—2)<” 
32) 1 4 6n 
Ko = 5 Kot Kaa kot Kaokus + 5 oka + Gy cg By SOK 
(5 ase PL * ae 3 6 ht 6n woth 
0 2) ~nK2t3 1 Xs To en tS so wt ares hy 2) “20 Lb 
(22) 1 l 
7) a) ae 51 were Kar Xs + KeoXon +5 Kao Xa + 5 Kt 
4n 2n 
*(n—1)(n— 5 — 2) Sto + Ty (wa) SOK 
3 0 l 9 9 6n 
«(4 3) or ee Kaku +7 Mis Katin- maa) 
21 or a 2 1 8 
"i 9) a" * 5-1 et 2 q M13%20 + Kar Koa + Soe ee 1 en 
ghia ae ons 
Ta-line 2) X11 X20%o2 + 1) (n— 2)“ 
(° 33) 1) a 23m 4) 27(4n —7) 
0 6 0) ~ 02% nim — 1) 70820 + ig — 12 Xeo%a0 t+ Fn — aye “soXeo 
54(4n —7) » , 162(5n—12) 36(7n? — 30n + 34) 





(m— 1? (nm — 2) “80% + (1) (nm — 2) “e0%00%00+ Ty Tay — aye 


108n(5n — 12) 
(n— 1)? (nm — 2)2 Koo 







































M. B. Coox 191 









































é. (° 3 Neds 21 gg 8  , 8n=4) |, 18(8n- 4) 
001) nn? n(in—1) 2? n(n—1)  U  n(n—1)2? © 2" n(in—-1)? 2 ™ 
12(4n—7) 15(4n —7) 24(4n —7) 
Kog “n(n —1)2 “s0Xa1 tig — aye Xa X40 (m— 1) (m— 2) “50%20%1 
30(4n —7) 72(5n — 12) 54(5n— 12) 
(n—1)?(n— 2) “a1 20 + G1)? (n—2) 1)? (n— 2) K31 K39K 20 m1) (n—2) 1)? (n— 9) “402120 
36(5n—12) | 36(7n®@—30n+84) , B6n(5n—12) 
Kou (a—1P(n—2) 0% 1h e—3F OO 1-2 
72n(5n — 12 
1X92 (n— r. (n— ? Konto 
(2?) it A MLM Ree 
ine 0 0 2) ~ n2X2 tai — 1) Kaku th 1) Ks2%20 tg — 1) SeoXte + i — aye “or Xan 
3(11n — 14) 3(5n — 8) 6(11n — 20) 9(3n —5) 
“n(n—1p? “# 30 n(n—1)? 50X22 “n(n—1)2 “41%s1 "n(n —1)2 “#2%eo 
6(5m—8) 9, 12(11m—20) 183-5) 
(n—1)?(n—2) ©" (n—1)?(n—2) 1" (n—1)?(n—-2) * ™ 
72(3n —7) 144(2n —5) 36(3n —7) 
(n—1)?(n—2) K31 KgqKq1 + (n—1)?(n—2) K31 Kai Ko9 + Ga — 1)? (m — 2) “22*20%20 
nay ORakn te  akiake to Mlk 
18(11n?— 48n + 56 36n 144n(2n—5 
20 Kh (n—1)?(n—2) ‘rook + (=I? (n—2) “2% * if (n= pak 
meat — 7 K39 Ka xt, 
(5 2 i) =a r 1 i cas 10 ie ag! 16 ae Ph dan! " 
01 1) > bX? a(n —y <Xe2 ty — 1) Kerk toy — 1) “82X20 t aig — 12 “eo%2 
4(10n — 13) (17m — 23) (19n — 34) 
n(n —1)2 K51 Koi + m(n—1)2 “42Xa0 “n(n —1)2 “50X22 
6 — fs = 5 ae 
en pr Kak en TE keke ip aay Root 
4(4n —7) 4(29n — 50) 2(31n — 55) 
(n= 1)? (m— 2) 9K20%02 + Ge ya — 9) Karas 20+ Gye — 9) Kaeo 
24(7n — 17) 24(11n — 26) 12(12n—29) 
(n— 1)?(n— 2) <3 Xs0%n1 + Ga — 1)? (nm — 2) “31X21 20 + (n—1)*(n— 2) <22*a0Xz0 
12(11n — 26) 36(2n —5) 6(5n — 12) 
(n—1)*(m — 2) “40X21 X11 + G18 (me — 2) K40%12K20 + 7a (mn — 2) “40 X20%o2 
6(11n®—48n +56) , 6(31n?—132n+148) 
(n= 1}? (n— 2% “8M TR (n— aye “90K 
ante = K12K3o +a =H Key KEK 11 a = aa = K39K30 Ko 
5 =F ia = Kao Kaos 











192 Bi-variate k-statistics and cumulants of their joint sampling distribution 


(? 2 ,) ~% 3 12 12 6(3n — 4) 


111 12X03 * nm — 1) 01%00 + aq — 1) Keka + Tm — 1) M4020 n(n — 12 “8112 


6(7n — 9) _ 4(5n —7) (n — 2) 6(n — 2) 


nln 1) “42821 iy — 12 $9990 Fg — 1)2 “00% os +m — 12 <s0Ki3 


3(13n — 22) 12(4n—7) 3(5n—9) 2 
n(n—1je “e229 — je Keak ty 1) “28%ao + (7 — ya X01 Ko2 














6(7n — 12) tits 12(3n — 5) ¥ 24(4n —7) , 
© (m= 1 (m= 2) TPE (m2) “402000 + Gy — 12 (q— By S00 a0Kts 
6(5n—9) 4g 24(2n—5) ,_24(8n-19) 
(n— 1)? (n—2) 9°" (n= 1)? (m— 2) “81 “a0%08 + Gy Ta (my — 2) Saran kn 
12(9n—22) | 24(5n—12) 60(3n—7) 
(n—1)*(n—2) “* 2011 Um — 18 (m— 2) “811220 + — 1p — 2) X22 21 29 
24(2n—5) - ,  6(7n—16) 12(5n— 12) 
(n — 1)? (n — 2) “18“s0%20 (m— 1)? (n— 2) <00%e1 X02 + yam — 2) XaoKia kn 
12(m—3)_ , 24(7n®— 30n + 34) 
(n— 1)? (n— 2) "40%0820T (nT) (m—2)2 “s0Xar 12 
2(5n? — 24n + 32) , 2(37n? — 156n + 172) 8n(n — 3) 
(n—1)?(n—2)2 “80X03 T7892 Kh + G18 — 22 X00 
12n(7n—16) 24n(8n— 19) 





(n—1)?(n— 2)2 “21 X20Xo2 + (n—1)*(n— pea X20 i 


24n(5n — 12) il 48n(2n — 5) 

(nm — 1)? (m—2)2 12°01 (py — 12 (m — 2y2 “90% 20%11 Kor 

8n(5n — 12) 

(n— 1)? (n—2)2*00Kt 
(; ae a Pau. Jie K -aplaeks Ka3k ae. O(5n — 8) 
003) mn?" n(n—1) "4" n(n—1) 8" n(n — 1) "9 Faq — Tye anon 

Pett. ae re 9 gins +, 2(5n—8) 54(n — 2) 
n(n—1) 9°" n(n — 1)" n(n — 1)? a1%oat oy — 12 Xo2Ka1 


18(5n—8)_ 
(n— 1)? (n—2) 
108(3n — 8) 162 108 
(n — 1)? (n —2) Kai Kar Xu + (n— 1)? “22*s0%11 +i —1p2*s1ki2X20 


108 18 


Ky kh ‘2 (n—1) 1)? “s2“20%1 *(@—1)(m—2) Keg 


Keo 


54(3n—8) _ - ” 
(n— 1)? (n—2) X22 a1 X20 + (n—1) KgoK12"11 + (m— 1)2X00%2 X12 


18(5n®— 24n + 32) 108n(3n — 8) 108n 
(n= (n—2)? “+ (1) 28 Xk + Ga ay Kia bok 


108” 
(n—1)*(n— 2) “™ 





Kis 





Koo 


Ky 


Kiy 


Koi Koo 


Ky2Kyy 


0X31 





* 
ain, 
= tt 
mm to 


)- 


Biometrika 38 


M. B. Coox 193 








1 1 14 2 
2 <3 tim — 1) “89820 + a — 1) “82k t ad) 1) e102 + 2m — 1) e0%os 
(17m—23) | (44n—59) (19-25) (5n — 8) 
“mn— 1p “X12 te Kq2Ko1 mn—1) KsaKs0 t+ i — 1)? 1)? Ks0Kis 
2(19n — 34) e _ (58n— 92) 92) _ , 3(4n—7) 
“n(n — 1)? a1“22 Thin — 1p K32X31 n(n—1)? Ke3Ka0 
2(5n — 8) , ae e + Sic 38) wae 2(11 — 20) oat 
(n—1)?(n— 2) 50 Ky 02 —1)?(n— 2) Kay ll (n— 1)? (n— 2) 41 20Xo2 
2(53n — 92) 6(4n —7) 12(3n —7) 
(n—1)?(n— 2) “s2K20%n1 + (n—1)?(n— py XeaXto +7 l?(n— 2) <31Xa0%or 
6(41n — 98 
5 | Kakaku to aay 5) Kekooknt nade = K31Ki2K29 
(m — 1)? (nm — 2) 1)? (n — 2) (n — 1)? (n—2) 
9(19n — 46) 12(3n—-7) 12(2n —5) 
(n—1)?(n— 2) “2221 X20 + (n—1)*(n— 9) K13Ka0X20 + Te — 18m — By Kamen Kon 
3(19n — 46) 9 3(53n2 — 228n + 260) 


K4oKy2Kq3 + — 





(n— 1)? (n—2) (n— -— pe 40X03 X20 +. (n—1)?(n—2)? K30K21Ki2 


¢— Fg 4 Le Tn? ~ 30 + 34) 6n is 
(n— 1)? 30" 03 (n — 1)? (nm —2)? Kat (1)? (m—2) <8 30 





24n(2n — 5) . 6n(417 — 98) 
(nm — 1)? (nm — 2p? “21 “20X02 ¥, (nm —1)2(n—2)2*% Koo Kix 


6n(19n — 46) 24n(3n — 7) 








(n—1)?(n —2)2 2k + Gy 1)? (n— 2)2 X30%20%1 11X92 
6n(9n — 22 
(n= Ie = 2B Kot 
LRIPR iat 
n2 54 n(n ni 1) 4311 n(n —1) 5202 n(n —1) 3420 n(n— 1)? 4212 
22n-3) 19-25)  26n-7) (n—2) 
n(n — 1)? “51 Xos “nin—1)j2 <38%a1 n(n—1)2 KeaXs0 + n(n — 1)2 “soo 
2(9n — 16) 2(25n — 43) 2(17n — 30) (5n—9) _ 
"n(n — 12 “41% 13 ~n(n— 12 “32X22 “n(n — 1 “2831 +h — 12X14 X40 
2 ,  4(9n—16) 4(15n— 26) 
+ api keoKte + Gham — 2) ay Karki1X0nt Ge —Tyaqm — ay Seek 
4(10n — 17) 4(17n —30) 2(5n —9) 4 
(nm — 1)? (m — 2) “82*20%02 + (2 (m — 9) * 23 Kook + (n— 1)?(n—2) “14% 
4(21n —50) 2(21n — 52) 8(18n — 43) 


(n — 1)2?(n— 2) “318m Koo + 


10(21n — 50) 4(13n — 32) 4(7n — 18) 


(n—1)?(n—2) KooKgqXKoz + (n—1)2(m — 2) “21 X12%a1 


(n—1)? (n — 2) “221k + (a — 1) (m— 2) “192082 F 7 )F (m— 2) “Broo 
4(30n — 71) 4(21n — 50) 4(2n—5) xo) 

(n—1)®(m — 2) “221220 + 8 (im = 2) “18X21 20+ Ge 1a 2) “00 20%o4 
8(3n —7) 2(7n — 18) 


(n— 1)2(m — 2) “40%12% 02 + (n —1)®(m — 2) “#00311 


13 








194 Bi-vuriate k-stalistics and cumulants of their joint sampling distribution 




















2(1922 — 84n + 100) 2(79n? — 336n + 376) 
(n—1p?(n—2)? “XoaXn tT Tain —2ye “a8 
8(7n2 — 30n + 34) in(7m—18) 2, 
(n—1)?(n—2)? i (n—1)?(n—2)? 4% 
8n(21n — 50) 16n(3n—7) 
(m— 1)? (m — 2)2 “2042111 Koz + 1)? (n— 22 <20Ko2 Kia 
_4n(21n — 50) s 8n(18n — 43) ali % 8n(2n — 5) se 
(n—1)?(n— 2)e X21 * (n— 1)? (n— 2)? 112012 (n— 1)? (n— 2)2 3020" 02 
4n(13n — 32) 


(n— 1) (n—2)2 (n—2)2 Ki, K39 Koa 


Kj aa me paths Keak: ree — Keo K; Kon + . Kz, K, 
0 2 2) ~ PX Tai — 1) “89% nim — 1) 8202 an — 1) en 1)?" afn—-1) 
4(7n—10) 4(10n—18) , <4), a 
n(n — 1)? 4212 n(n —1)? 3321 n(n — 1)2 24 K3o n(n—1) 50" 04 
4(4n— 2, Ngg + 22OM HAD ggg 4 12BM=F), 4 B=), 
n(n Ty? Xa 813 tin — 1p “ 2 Kee +—7 nn — 1 Ko3K31 n(n —1)2 14*40 
me et n-D A934), 
(n—1)(n-: 2) 50 Ko2 + (n— 1)2 (1 (n— 2) 4111" 02 (n— 1)? (n—2) 32™11 


A(7n— 13) _ 24(3n —5) 


6 
2 
(n—1)*(n—2) K32KoqKo2 + in—1P(n— ‘(n — 2) X2820%i1 i p— 1p “1420 








24(3n —7) 12(3n —7) 24(7n — 17) 
(n—1)?(n— 2) “a1 X21 Koz + Cy — 1)? (n — 2) 22 *so%o2 + Gy — 1)2(m — 2) “92121 


12(21n — 50) 24(2n —5) 24 
(n—1)?(n— 9) Kee Kaku + Gi —1)2(n—2) 2) K13K39 Ku + — 12 “a1 “os “20 
12(9n — i wattthaoa 24(3n —7) 


* (n—1)? (n— 2) <e2“12X ‘20 +7 (nm — 1)®(m — 2) “18X21 X20 + ya S04 20X20 


12(n—3) 12 


2 36 
(m — 1)? (m— 2) X40%12K02 + (n — 1)240%os%11 in — 1230 Kos Ke 





24(7n*—30n +34). | 24(2n?—9n+ 11) din 
(n—1)?(n—2)? “7° ™ (n—1)?(n—2)? KieXs0 + Go 1)? (n—2) K20%11 Kos 


48n(3n — 7) sie 24n(n — 3) 
(n—1)2(n 2 (n — 22 2021 Ks1 Koa + 1? (n— D212 t0Xo2 











_24n(5n — 12) _ , 24n(7m—17) 12n ? 

(n—1)? (n— 2)? Koi Ku (n— 1)? (n— 2)? 12"20 Sioa —1)?(n— 2) Ko 20Kbz 
24n(2n --5) 

(a= 1}? (n— 2°90 Kon 





(° r ) = ee 3 6 3 
031) n® *° n(n 1) Kas ‘uta ~) Xs Kor + im — 1) 20+ rp — 1 Noten 


3(10n — 13) 3(13” — 19) 9 3 in — 8) 
n(n m—1y * ‘42Ky2 + “n(n—1)* K33 Ka + nin—1)* “24X30 n(n — ype Xa Xt 








2 
0 Koz 


12%y1 


11X93 


Koz 





M. B. Coox 195 


3(19n — 34) 3(11n — 20) 


“nin— 1p? “82% Ge * 23 ‘at Gi — 1) X14K40 


6(5n — 8) 24(4n —7) 18 
(n— 1)? (n— 2) “411102 + G2 — 2)* 32 ti+G—pp — 12 “s220%o2 
6(11n— 20 18(3n—8 
Cae (n—1)(n—2) “4 tet hank 
4 2Te e  80Gn— 12), 9(33n—82) 
(nm — 1)2<22*a0%o2 + (7 — 1)8(m — 2) <8 M12 Ki (n—1)?(n—2) 
54 18 36(3n —7) 
tho (m—1p K13 KaoKia + Gy — 12 “31 Koa 20+ Gy —3)8 i — 2) Kae X12 X20 
18(3n — 8) 9 
* (= 1 (m—2) (n—1) 
27 9(19n?—84n +100) , 
+ Ga — 172 X30%os Xan + (n—1)?(n—2) 


(n— 
18n 36n(3n — 8) 
(n—1)?(n 2) 2 XoK + IP (n— 2)2 <20%21 X11 Xo2 


Ko3Ko9Kq1 + 
KooKe Ky 
Ky3Ko1Ko9 + 


9 
Kao X12Ko2 *@-1) =i) KaoXo3K11 





54, 
Koi Kie +i — 1812s 





18n 18n(9n — 22) 
+ @—1P(n—2 ~ By K12\doKo2 + 

54n e 
+ (n—1)?(n—2) K39K 41 Koo- 


36n(5n — 12) 
(a—1)?(w— 2+ 1a OP 





Kiko Ky 





—. . 2 
“Lo 0} = n X97 7 SeoKem 
31 1 3 
Ko J =X ty ek th Xo 
22) 1 2 4 
«(| 0) = 5 hat 1 Kok +o ha Kao» 
3 0 1 6 
K(, ) Tah meek AP 
21 ] 2 1 
3 OF bay ae kak “1 to X18K20 +7 Xa0Xoa 


12 1 + 2 
Klo o} =n Xt —y Merk ty Ma X20- 


I am indebted to Dr F. N. David and Dr John Wishart for a number of suggestions and 
help in the presentation of this paper. 


REFERENCES 


Davip, F. N. & KENDALL, M. G. (1949). Biometrika, 36, 431. 
FISHER, R. A. (1929). Proc. Lond. Math. Soc. 30, 199. 
KENDALL, M. G. (1940). Ann. Eugen., Lond., 10, 392. 
Wisuart, J. (1928). Proc. Lond. Math. Soc. 29, 309. 








[ 196 ] 


RANDOM DISPERSAL IN THEORETICAL POPULATIONS 


By J. G. SKELLAM 
The Nature Conservancy, London 


SYNOPSIS 


The random-walk problem is adopted as a starting point for the analytical study of dispersal 
in living organisms. The solution is used as a basis for the study of the expansion of a growing 
population, and illustrative examples are given. The law of diffusion is deduced and applied 
to the understanding of the spatial distribution of population density in both linear and 
two-dimensional habitats on various assumptions as to the mode of population growth or 
decline. For the numerical solution of certain cases an iterative process is described and 
a short table of a new function is given. The equilibrium states of the various analytical 
models are considered in relation to the size of the habitat, and questions of stability are 
investigated. A mode of population growth resulting from the random scattering of the re- 
productive units in a population discrete in time, is deduced and used as a basis for a study on 
interspecific competition. The extent to which the present analytical formulation is applicable 
to biological situations, and some of the more important biological implications are briefly 
considered. ; 


1. INTRODUCTION 


1-1. It is now fifty years since the publication of The Origin of the British Flora by Clement 
Reid (1899). In it is suggested an interesting numerical problem on the rate of dispersal of 
plants. Reid states: ‘Though the post-glacial period counts its thousands of years, it was not 
indefinitely long, and few plants that merely scatter their seed could advance more than 
a yard in a year, for though the seed might be thrown further, it would be several seasons 
before an oak for instance, would be sufficiently grown to form a fresh starting point. The 
oak, to gain its present most northerly position in North Britain after being driven out by 
the cold, probably had to travel fully six hundred miles, and this without external aid would 
take something like a million years.’ 


1-2. At the end of the last century, biologists, unlike physicists, rarely formulated such 
problems in terms of simplified abstract models, due no doubt to the comparatively greater 
complexity of biological systems. A beginning might have been made on the subject of 
dispersal, for much of the necessary mathematical technique had been developed already, 
and, in fact, had been utilized by Maxwell (1860) in developing a kinetic theory of gases 
based on the behaviour of an infinity of perfectly elastic spheres moving at random. The 
present century has witnessed the great success of the analytical method to quote only the 
work of Fisher (1930), Haldane (1932) and Wright (1931) in evolutionary genetics, and 
of Volterra (1931), Lotka (1925, 1939) and Kostitzin (1939) in ecology. Nevertheless, 
biologists as a whole have been reluctant to develop the analytical as distinct from the 
purely statistical approach, and apart from the pioneer work of Karl Pearson (1906) and of 


Brownlee (1911), the mathematical aspects of the problem of dispersal have not received the 
attention they deserve. 








persal 
owing 
plied 
r and 
th or 
1 and 
ytical 
y are 
he re- 
dy on 
cable 
riefly 


ment 
sal of 
1s not 
than 
asons 
. The 
ut by 
vould 


| such 
‘eater 
act of 
eady, 
gases 
. The 
ly the 
;, and 
eless, 
n the 
nd of 
d the 





J. G. SKELLAM 197 


2. RANDOM DISPERSAL AND DIFFUSION 


2-1. Random displacement. The mathematical methods employed in the present treatment 
are largely the natural outcome of the solution of the well-known random-walk problem 
associated with Pearson (1906), Rayleigh (1919) and Kluyver (1905), and reviewed more 
recently by Chandrasekhar (1943). In the simplest form of this problem, a particle on a line 
jumps one place to the right or one to the left, both being equally likely. It is easily seen that 
the probability distribution is: 


Position Set ake Be 1 2 3 
After | jump 4 ‘ } ‘ 
After 2 jumps } z } 


After 3 jumps 4 ; 3 , 4 ° 3 
and so on, with the general result that after » jumps the distribution is binomial with 
variance equal to n. As n becomes large the distribution ‘tends to’ normality. 

A natural extension of the problem to N dimensions, in which the particle moves on a 
rectangular lattice, yields ‘in the limit’ an N-variate normal distribution which is the 
product of N independent distributions. . 

An alternative model for the two- or three-dimensional analogue consists of a randomly 
kinked chain with rotatable links. If one end is fixed, the probability distribution of the 
position of the free end tends to normality when the number of links is large (see Uspensky, 
1937). In the two-dimensional case with only seven links it has been shown by actual 
calculation by Pearson (1906) that the normal distribution provides a remarkably good 
approximation. 

The model considered by Pearson is, however, somewhat artificial when applied to 
migration in that the links (or flights) are taken as being equal in size. In practice they will 
vary. The important special case where the probability distribution of the end-point of 
a flight is bivariate-normal (with the starting point of the flight as the centre) has been 
considered by Brownlee (1911). A solution of the more general case is outlined below. 


2-2. A spreading population. Consider a surface of unlimited extent approximating to 
a Euclidean plane with co-ordinate axes OX and OY, and in the immediate neighbourhood 
of the origin let there be to start with a single self-reproducing particle of a species not 
otherwise represented. The probability distribution of the position of a particular descendant 
after n generations may be deduced once the characteristic function of the probability 
distribution of offspring about their parent is given. For a parent located at (,7), let this 
peneeon be P(t, u|E, 4) = exp {ikt + inu + X(k,,t’ueitt#/r! 8 !)}.* 
The k,, are assumed independent of £ and 7, for it is reasonable to suppose that the location 
of the parent does not affect the form of the distribution of the offspring. Let the cumulative 
distribution function for the position of a particle in the nth generation be F,(&,7) with 
¢,(t, u) as the corresponding characteristic function. Assuming that the generations do not 
overlap in time, we then have 


dnaa(t,u) = i) | Gt, w | E, 9) dF (E, 7) = $(t, | 0, 0) | } ettltimw dF (E,9) = $(t,w| 0,0) da(t, %). 


* Here and below in this paper the customary parentheses surrounding terms in the denominator, 
to the right of the solidus, have been omitted. Thus .../r!s!} should be printed ... /(r!8!)} in 
accordance with the common convention. It is considered, however, that no confusion will result 
here from such omissions. 











198 Random dispersal in theoretical populations 


Clearly ¢,, = [P(t,w|0,0)]". It is apparent that all cumulants increase n-fold (as in the 
summation of independent variables). Standardization of the scale is effected by changing 
NK,, to NK,,/n**+®, The approach to normality is obviously very rapid when the distribution 
of offspring about their parent is roughly normal to begin with. In the present treatment 
we assume that Ky, = Ky) = 0 so that there is no drift. Nevertheless, it should be noted that 
a slight systematic drift, no matter how small, is ultimately the most important cause of 
displacement, that is, when n is chosen sufficiently large. 

The distribution of the position of a particle of the nth generation will henceforth be 


taken as dF,(x,y) = 7 a-2n- exp { — [a2 +y"]/na} dady. (1) 


Since we shall be concerned mainly with absolute distances from the origin, it is convenient 
to make the polar transformation x = rcos@, y = rsin@. From (1) we then obtain 


dF, = nm™a-*n- exp {—r?/na*}rdrd@ (0<0<27,0<r<o). (2) 


If ry is the distance from the origin of an individual value chosen at random, then the 
probability that r—4dr<r,<r+4dr is obtained by integrating over 0, giving the radial 
probability density 441 a2) = exp{—r!/na¥}2r/na® (G<r<0). (3) 
The parameter a? is to be interpreted as the mean-square dispersion per generation, analogous 
with the mean-square velocity in Maxwell’s distribution. The maximum likelihood estimate 
of a? based on v observed values of r is simply a? = Xr?/nv. 

Of the population spread out after n generations, that proportion lying outside a circle 

of radius FR is ia 
p= | exp [—r?/na?] 2rdrjna? = exp {— R?/na*}. (4) 

R 


In a subsequent argument it is essential to make p very smail indeed. By choosing p = 1/N,,, 
where WN, is the final population after n generations, it follows that only one particle can be 
expected outside a circle of radius R, = (na*log N,)*. If during this period the population 
has increased geometrically in accordance with the relation N = e””* (where c = y? may be 
termed the Malthusian parameter, as in Fisher (1930)), we obtain immediately the relation 


R,|n = ay. (5) 
If a circular contour is drawn in the X Y plane through all points having an arbitrarily 


chosen constant areal population density, then as time passes,the contour expands outwards. 
From the condition, yn —log n —r?/na? = constant, it is easily seen that 


Tn41— "ny. (5a) 
The ultimate velocity of propagation of this ‘wave-front’ (D. G. Kendall, 1948) is therefore 


the same as the constant velocity of expansion of the circle within which all but one of the 
population can be ‘expected’ to lie. 


2-3. A numerical illustration. Let us consider the application of equation (4) to Reid’s 
problem. We can clearly establish a rigorous conclusion in the form of an inequality provided 
that we can fix appropriate bounds to the various parameters. 

The oak does not produce acorns until it is sixty or seventy years old, and even then it is 
not mature. It then produces acorns over a period of several hundred years. The rate of 
population expansion is obviously very much less than it would be if all the immediate 
offspring of a single individual could come into existence in the sixtieth year of its life. We 








_ ted oe 2 


n the 
nging 
ution 
tment 
1 that 
use of 


th be 


(3) 
ogous 
imate 


circle 


(4) 


1/N,,, 
an be 
ation 
ay be 
ation 

(5) 


rarily 
ards. 


(5a) 


‘efore 
f the 


.eid’s 
vided 


1 it is 
ite of 
diate 
». We 





J. G. SKELLAM 199 


are certainly safe in taking the generation time as not less than sixty years. Now according 
to De Geer (1910) the final recession of the ice started at about 18,000B.c., and yet in Roman 
Britain (Tansley, 1939, p. 171) oak forests were apparently well established throughout the 
country. It appears safe to take n < 300. 

The number of acorns produced by an oak in the course of its life must be prodigious. 
Nevertheless, under typical present-day conditions almost every acorn which falls to the 
ground in autumn is consumed by birds and mammals (Watt, 1919). Of those remaining, some 
decay and many fail to germinate. Mortality among the young seedlings is very high, and 
even when not overshadowed it seems that only about 1% of the seedlings are likely to 
survive the next three years. Though there has been much speculation, little is really known 
about the biological conditions following the last glaciation. Even so, it seems perfectly 
safe to assume that the average number of mature daughter oaks produced by a single parent 
oak did not exceed 9 million. To most biologists such a figure must appear unnecessarily 
high, but we must remember that it is the rate of population growth at the periphery of the 
advancing ‘wave’ which matters most, and here there will be least competition. 

The fact that the British Isles are of finite extent with irregular coastline does not detract 
from the power of the argument, for it is obvious that such a.shape in no way encourages the 
advancing ‘wave-front’. 

We then have R/a < 300 ,/(log 9,000,000) = 1200. 

In the original form of the problem as stated by Reid, RF is given as 600 miles. It then 
follows that a (the root mean square distance of daughter oaks about their parent) > } mile. 
On these premises the conclusion which Reid reached appears inescapable—namely, that 
animals such as rooks must have played a major role as agents of dispersal. 

If, however, we regard the upper bounds of a, n, y (given above) as collectively excessive, 
we are driven to accept the view that the last glaciation was not as extensive as was previously 
believed (W. B. Wright, 1937), and to suppose that the oak population regenerated from 
scattered pockets which survived in favourable valleys. 

At this point it is interesting to consider the actual succession of forest trees. since glacial 
times as revealed by pollen analysis (Godwin, 1934). In late sub-arctic and pre-boreal times, 
birches with their small, winged fruits, willows with their plumed seeds, and pine with its 
winged seeds, were able to spread quickly and were plentiful. Hazel, however, did not reach 
its maximum until Boreal times, and oak with still larger fruits not until much later. It is 
quite possible that the relatively different rates of dispersal were not unimportant in 
determining the order of succession. 

2-4. The dispersal of small animals. The results already deduced can be applied equally 
well to the dispersal of small animals such as earthworms and snails. For example, if the 
mean square ‘dispersion’ per minute of a wingless ground beetle wandering at random is 
1 yard?, then after a season of 6 months, say, without rest, the R.M.S.D. of the resulting 
probability distribution is only 500 yards. The probability that after 6 months the beetle 
wanders more than a mile from the starting point is less than 8 in a million. Without 
external aid a period of time equivalent to 1,000,000 seasons would be required to raise the 
R.M.S.D. to 290 miles. The figures, however, are slightly deceptive, for the final position 
arrived at is in general a great deal nearer the origin than the farthermost position previously 
reached. Nevertheless, calculations of this kind clearly indicate in these cases the importance 
in the long run of rare accidental displacements due to external agencies such as high gales, 
floating plant material, the muddy feet of birds and mammals. 








200 Random dispersal in theoretical populations 


2-5. Empirical confirmation. In practice there is rarely sufficient information to construct 
the contours of population density with accuracy. One contour, however, can sometimes 
be drawn—that for the low ‘threshold’ density (depending on the thoroughness of the survey) 
at which the population begins to escape notice altogether. 

Equation (4), derived initially on theoretical grounds, is well illustrated by the spread of 
the muskrat, Ondatra zibethica L., in central Europe since its introduction in 1905. Fig. 1, 
based on Ulbrich (1930), shows the apparent boundaries for certain years. If we are prepared 
to accept such a boundary as being representative of a theoretical contour, then we must 
regard the area enclosed by that boundary as an estimate of mr?. The relation between the 
time and ,/area is shown graphically in Fig. 2. 


VArea 





Munich 








1910 1920 1930 
Fig. 1 Fig. 2 


2-6. The analogy with diffusion. If a random particle suffers a displacement ¢ in any 
direction at regular intervals of time (t,t+w,t+2w,...), and if the probability density is 
denoted by y, it is clear that y(x, y, t+) is the mean value of ¥(&, 7, t) for all (€, 7) on a circle 
of radius ¢ around (x,y). That is, 

2n 
¥(z,y,t+o) = | (x +ecos6,y+esin 6, t) dé. 
0 
Expanding by Taylor’s theorem and noting that 


Qn 2n 2a 
| cos 6 dé -{ sin 6 d0 -{ sin 8 cos 0d0 = 0, 
0 0 0 


2a 22 
I cost add = | sin? d0 = 7, 
0 0 


we obtain for infinitesimally small ¢ and w the relation 


0 le? 
v aicvy, (6) 
2 2 
where v2 = at ee 








struct 
‘times 
irvey) 


ead of 
fig. 1, 
pared 
/ must 
en the 


1930 


in any 
sity is 
v circle 


(6) 








J. G. SKELLAM 201 


The partial differential equation (6) is of course well known in connexion with the con- 
duction of heat and related problems in mathematical physics (Jeans, 1921). The one- 
dimensional form has been applied by R. A. Fisher (1922) to the problem of the distribution 
of gene differences, and the three-dimensional form by Rashevsky (1948) to problems 
arising from the diffusion of materials from and into living cells. 

It should be noted that the fundamental differential equation (6) is satisfied by the 
distribution 


fp = 10-4 oxy {— (22+ y?)/ta%}, 
already considered (see equation (1)), with the unit of time as one generation and a* = e?/w. 
It is apparent that many ecological problems have a physical analogue, and that the 
solution of these problems will require treatments and the use of functions with which we 
are already very familiar. Unlike most of the particles considered by physicists, however, 
living organisms reproduce, and members of the same and of different species interact. As 
a result the equations of mathematical ecology are often of a new and unusual kind (see, for 
example, Volterra (1931)), and require special treatment. 


3. DISPERSAL AND POPULATION DENSITY 


3-1. Modes of population growth. Many problems on dispersal cannot be formulated unless 
some law of population growth (in the absence of dispersal) is assumed. As long as the 
population is small or shows a natural tendency to decrease, the Malthusian law dN /dt = cN 
is usually satisfactory. If the population is not small the Pearl-Verhulst logistic law is more 
appropriate. 

This law may be written in the form 

dN : 
ae =cN-IN P (7) 
where, following Kostitzin (1939), c may be termed the coefficient of increase and / the 


limiting coefficient. Since a stable population level exists at M =c/I, the law is often written 
in the form aN 


3 = oN (M-N)/M. (7a) 


For our present purpose it is convenient to use the concept of population density denoted 
by ’, a function of time and place. In general, the vital coefficients will also vary. The 
logistic law may then be written 


Sve. 4st) = e4(0, 9) Y—e4(x,y) (8) 


In the simplest cases where the vital coefficients remain constant throughout the habitat 
under consideration, the unit of population density may be chosen for convenience as that 
prevailing when too, and the law then takes the very simple form 

Ch 

== ce¥(1-—'P). (9) 
When conditions become extremely unfavourable and ultimate survival not possible, the 
logistic law (7) may still be applied with c negative (= —g?). The population declines rapidly 
unless maintained by immigration. Such a population may be termed negative-logistic. 








202 Random dispersal in theoretical populations 


A convenient change of scale then gives 


ad¥ 
nee lide a bf 


Two further laws of population growth applicable to populations which are discrete in 
time are deduced in §§4-1 and 4-3. 


3-2. Centres of multiplication and extinction. We now consider the distribution in space of 
the density of a population which by reason of random dispersal, tends to flow from regions 
in which conditions are favourable to regions in which they are unfavourable. Whether 
a steady state can exist or not depends on the ability of the population in the favourable 
regions to make good the decline in the unfavourable ones. 

Mathematically it is simpler to consider a linear habitat such as a coastal zone (Fisher, 
1937) or the bank of a river. In these cases we are justified in assuming that an adjustment 
has been made in the vital coefficients to allow for diffusion inland, for this danger may be 
regarded as one of the many hazards associated with the habitat. Moreover, as will be 
apparent later, the mathematical form of the Malthusian law and of the logistic law (7) 
persists almost unchanged when the rate of diffusion inland is deducted from the rate of 
population growth. 

The two-dimensional problems which most readily lend themselves to mathematical 
treatment are those with radial symmetry. Rough approximations to circles are sometimes 
afforded by islands, hill tops, marshes, patches of woodland, and approximations to annuli 
are provided at the margins of lakes, by the zones on hills of a conical or hemispherical shape, 
or even by the zonation created by rabbits around their individual burrows or around such 
patches of woodland as they infest (see, for example, Tansley, 1939, pp. 587, 505, 138, 141). 


3-3. Malthusian populations in linear habitats. The one-dimensional form of (6) expressing 


the effect of random dispersal is ay ae e ay (10) 
ot = 2 woa?* 


In terms of population density, with a? = ¢?/w, we then have as the combined effect of 
dispersal and population growth ay aay 


= 192 , 11 
Ay ta a) +cP¥ (11) 
The condition for a steady state, assuming it to exist, is 


adY 2 
tat =? (12) 


Case (i). Bordering on a suitable habitat is an unfavourable zone extending indefinitely in 
one direction: c=-g' in %<2z<0, V(x,)=B, ¥(o)=0. 
Solution: Y = Be-™@-*), where m=g ,2/a. 
Case (it). An unfavourable interval is flanked at both ends by a suitable habitat: 
c=-g? in -—%<ar<x,, V(—2x,) = ¥(x,) = B. 


Solution: Y = Beosh mz/cosh may. 














rete in 


ace of 
egions 
hether 
urable 


‘isher, 
tment 
1ay be 
vill be 
aw (7) 
‘ate of 


1atical 
etimes 
annuli 
shape, 
d such 
, 141). 


essing 
(10) 
ect of 


(11) 


(12) 


tely in 








J. G. SKELLAM 203 


Case (iit). An apparently suitable habitat is flanked on either side by conditions which are 
extremely unfavourable. (It is assumed, for simplicity, that the organisms show no marked 
negative tactic response to the changed conditions, so that there is no appreciable reflexion 


at the boundary): c=y? in —a,<x<2,, V(—2,) = ¥(a%) = 0. 


Solutions that are non-negative but not zero everywhere in the interval are possible only 
when x, = }7a/y ,/2, and we then have 


Y = Cos (ry,/2/a). (13) 


When 2, is greater than the critical value given above, the population increases indefinitely, 
and when 2, is less than this value the population declines to extinction. For consider the 
partial differential equation (11) with c = y? subject to the boundary conditions 


V(x, t) = ¥(—a,t) =0 (0<t<o), 
and an arbitrary initial state developed as a Fourier half-range sine series 
Y (a, 0) = a4, sin (4s7[a” + Xp] /%y). 
The solution is V(x, t) = py A, ck# sin (48m[a% + xp]/2p), 
where k, = y? — 4a?(4s77/x,)®. The first of the k, is obviously the greatest, so that in the course 


of time the solution is dominated by its first term, which tends too, (13), or 0 as x, 2 47a/y ,/2. 


3:4. Logistic populations in linear habitats. From (10) and (9) we obtain the fundamental 
equation . 
ov ,7r 


eae ay] — 
a1 sa aya + W(1-—). (14) 
Mathematically this is the same as an equation of R. A. Fisher (1937) given as 
Op, 0p 
a * aa 4 


in connexion with the problem of gene flow in a linear population. Gene frequency p corre- 
sponds to population density and the term in pg=p(1—~p) expressing the effect of natural 
selection corresponds to the term in ‘Y’(1 — I’) expressing logistic growth. The results deduced 
here in connexion with ecological problems automatically apply to the genetic analogues 
where they exist. Whereas, however, ‘’ may exceed unity, p is restricted to 0 < p< 1. Negative 
solutions are inadmissible in both types of problem. 

The condition for a steady state is 


ay 7 

qe t FC —¥)=0, where z=zy,/2/a, (15) 
or in the case of a negative-logistic population (see (3-1)) 

ay 

qe YU +¥)=0, where z=2g,/2/a. (16) 


The substitution w=(P,—‘Y)/¥,(1—y), where VYyo=¥(z)|,~o, 
converts (15) into u” = 1+Au—}(1—A?)u*, where A= 2¥)-1. (17) 











204 Random dispersal in theoretical populations 


Case (i). c=y'%, 0<¥,<1, « = 0. 
z=0 

. du dud (du : . : re be 
Since 7a - (i) we find on integrating (17), subject to the condition du/dz = 0 when 

a 2 
1 3(z) = ut fdu?— zy (1—A?) wu’. 
Hence |z| /2= i x {1(1 —a?7) (1+ £?7)}-*d7, 

0 
—A2\i —)2\t 

where 402 = (5°) —-A and 4f7= (=) +A. 


In terms of Legendre’s elliptic integrals, of which there are tables (1825), we then have the 


solution 
zp2-* = F(k, 4m) — F(k, arc cosa ,/u), 


4—A2\t 
wk 





where p?=a?+f? and i = 22 [( 


The inversion of this result leads to a simpler expression in terms of the Jacobian elliptic 
function sd (y) denoting sinam y/Aam y. We thereby write 


ju = sd (2-42, k)/p. (18) 
In practice, however, it is no more troublesome to compute a particular solution by iteration. 


For under the condition at 


= = 0 we may write equation (15) in the form 


z=0 





¥(z) = ¥(0)- | “dr | "FOU-VONd. 


Approximate values of ‘Y are substituted on the right and the integrations performed 
numerically as in §3-9. 

If we make ’, vanishingly small and proceed to determine z, such that ‘(z,) = 0, we find 
that u(z,)—> 1 and p> 2-+. Equation (18) passes formally into 2-+ = sin }z,, confirming the 
result x, = 47a/y ,/2 given in §3:3 (iii). 


Case (ii). c=y%, 1<¥,<%, — 





Following the same procedure as before we find 


\z| /2= I (rate) (1-4 f2r)}-¥dr, 





where 402 = A- (4*) and 4f2=A+ (5*y. 

4—A%\! 
Hence 2-*fz = F(k,arctanf ju), where k= () /e 2. (19) 
Alternatively Ju = tn (2-482, k)/f. 








~_e 





when 


ve the 


lliptic 


(18) 


ation. 


rmed 


e find 
g the 


(19) 








_ 





J. G. SKELLAM 205 
sin 0 
Since F(k,0)= I , (-8)0-eay a, 


F(1,0) = arg tanh (sin 0) = gd—0. 
In the important case when ‘V, is close to 1, so that f? is near } and k near 1, we find that 
29) passes into $2 = gd (arctan ./(}u)) or sinh4}z = ,/(4u), 
with the result that coshz = 1+u = (Y-1)/(¥,-1), 
a conclusion which may be conjectured directly from the differential equation itself. 


Other cases. Solutions in terms of elliptic functions are obtained in a similar manner after 
making appropriate substitutions. No special treatment is required in the case of a negative 
logistic population maintained by immigration in a region of extinction. We merely write 
+1 instead of ¥ in the corresponding solution for the positive logistic case. 

3-5. Malthusian populations in two-dimensional habitats. The fundamental equation 
expressing the combined effect of diffusion and geometric increase in a habitat throughout 
which conditions are constant is 


ad = joV®F +c¥(x,y, 1) (20) 


In radially symmetrical cases we may write 


A 4. aS 
OE ai ts (Fe +75) +c'¥(r, t), (21) 
where r? = x? + y?, 
Case (t). A circular region of multiplication enclosed in a zone of absolute extinction: 
c=y? in O<r<n, O<t<oo; WVir,,t) = 0. 


To ensure that the solution is finite at the origin we impose the further condition x ¥; =0., 
r=0 


The arbitrary initial state may be expressed as a Bessel series 
¥(r, 0) = ~ A,J(1j,/7); 


where j, is the sth zero of J, and A, is determined in the usual Fourier manner by reason of 
the orthogonality relation 


1 
[/ whluie Jotuieddy = 0 (e+e 
The solution is V(r,t) = DA, e*! Jo(rj,/75), 
8 
where k, = y?— 4073/1). 


Clearly ‘Y ~ A, e%:! J,(rj,/r,), tending to 0 or 00 as 7, $j,a/2y. It appears again that habitats 
below a certain critical size, depending on the conditions, are insufficient to maintain a species 
undergoing unrestricted random dispersal. 


Case (ii). Let (£, 7) be the co-ordinates of the place of origin of an individual of the parental 
generation for which the density function in space is ‘Y(&,7). Let the probability that 
a particular one of its offspring originates in the interval (x + $dz; y + }dy) be 


a(z—£,y—1 | a*)dxdy. 











206 Random dispersal in theoretical populations 


Let the probability that such an offspring reaches maturity be P(x,y). For most simple, 
small populations P may be taken independent of ‘’. Then if on the average the number of 
original offspring per parent is vy we may define a viability function V(x, y)=vP(zx,y). The 
condition for a steady state from generation to generation is the integral equation 


Yeu) = Vew{ [YE nale—Ey—| addr. (22) 
For the purpose of the present simplified model we take a to be the function 


mta~* exp {—[(x— £)? + (y—)*]/a"}, 


a choice justified by previous arguments. The function V(x, y) needs to be positive for all 
values of the variable, and preferably continuous. It will then be seen that the choice 
V(x, y) = exp {(b? — a? — y?)/v} not only gives a reasonable spatial distribution of viability, 
but also renders the integral equation tractable. Viability is greatest at the origin, 
V, = exp {b?/v}. The circle x?+y*? = 6? divides the plane into two parts, and in the absence 
of dispersal, the population inside the circle would increase and the population outside would 
decline. 


The integral equation (22) then has a solution of the form 
‘¥(E,9) = An-*0-1 exp { — (£? + 9°) /9}. (23) 
Since the double integral is the product of independent integrals and since 


[7,6 | 0,)cuee— | 6,)dE = a(ae| 0, +04), 


where « is a normal function, we find on substituting (23) into (22) and equating coefficients 


that 
:- b> tee. 7 
6~v'd+al and 6 O+a?’ where log Vo = 5?/v. 
From these it follows that b? = ah log Y/(% — 1). 


In the sense indicated earlier, b is the radius of the region of multiplication. Unless this value 
is attained the population must decline to extinction. 


Case (iit). An infinite zone of extinction around a single circular area of multiplication: 


c=-g? in m<r<ao, V(r,)=B, V(o)=0. 


The steady state is given by 
ay id¥ 
en cee = = 
a oe a ™ ‘=0 where m = 2g/a. (24) 
The solution is V(r) = A [em cosh u dy = AK,(rm), 
0 


where K, is the modified Bessel function of the second kind of order zero, and A = B/K,(r,m). 
It may be noted that K(z) ~ e~*(7/2z)* for large z. When r, is very large the problem reduces 
to a linear one (§3-3(i)), the radial component of the R.ms.D./generation being a/,/2. 

As a numerical illustration suppose that in the outer zone the population shows a natural 
tendency to decline at a rate of very nearly 4 % per generation (g = 0-2). If the radius of the 
reservoir is 1000 yards and the R.M.S.D./generation 100 yards, then the population density 





mple, 
ber of 
. The 


(22) 


for all 
shoice 
bility, 
rigin, 
sence 
would 


(23) 


cients 


value 


‘tion: 


(24) 


(74). 
educes 


atural 
of the 
lensity 





J. G. SKELLAM 207 


at a point 1000 yards outside the boundary compared with that at the boundary will be 
approximately e~200™m 2000-#/(e-1000™ 1000-4) = 0-01295, in close agreement with the more 
accurate value 0-01312 based on the tables of Ky given in Watson (1944). 


Case (iv). A circular region of extinction: 
c=-g? in O<r<r, YV'(0)=0, V(r,)=B 
Solution: V(r) = Bh(rm)/1,(r,™), 
where J, is the modified Bessel function of the first kind of order zero. 
Case (v). An annulus of multiplication in a region of absolute extinction: 
c=y? in ra<r<ry, V(rg) = '¥(r%) = 
A stationary non-negative solution exists for every fixed value r, provided that the value of 
r, is made to depend on r,. If ~ denotes 2y/a the required solution is 
¥(r) = A[do(ur) Yo(ur'a) — Sola) Yo(ur)], 
where Y, is the Bessel function of the second kind (of Weber’s type) of order zero. 
If h = r,/r,, it may be seen that yr, is the first root (29) of the equation 
Jo( a) Yo( 2x) — Jo(ze) Yo( ize) = 0. (25) 


The relation between r,/r, and wr, —r, is that between h and (h—1)2 9, the form in which 
the zeros of (25) are tabulated in Jahnke & Emde (1945). As 7, increases from zero, (7, —1,) 
moves from jo, to 7, and the problem rapidly degenerates to the linear case. 

3-6. Logistic populations in two-dimensional habitats. The combined effect of diffusion and 
logistic growth in radially symmetrical habitats may be expressed by 


ww ar 10 , 
_o (art; oe +¥(1- ¥)). (26) 


where z = 2ry/a. In order to describe stationary states we shall require solutions of the 
non-linear differential equation 





aA (2) | 1d¥ 
“de tz dz" bessthisdlind (27) 
especially those that are finite at the origin, in which case 
n0\ = 4 e 
(= ,¥ *, 0. (28) 


The equation, not being reducible to a Painlevé transcendent, is not free from movable 
critical points (Valiron, 1945, pp. 293-4). The general march of solutions finite at the origin 
is shown in Fig. 3. Since ‘Y(1—‘) is positive if and only if 0<‘¥ <1, it follows from the 
differential equation that the turning points of all solutions in the strip 0 < ‘Y’ < 1 are maxima. 
Similarly, the only turning points for which ‘> 1 or ‘YY’ <0 are minima. 

The spatial distribution of population density is illustrated by AB in the case of a region of 
extinction, by CD in a circular region of multiplication, and by HF in the case of an annular 
region of multiplication. 

Particular solutions of (27) subject to stated aeendiay conditions may be computed 
without much labour by the iterative method outlined in §3-9. 











208 Random dispersal in theoretical populations 


3-7. The function Q(z; q) is defined as the solution of equation (27) subject to condition (28) 
and the relation Q(0; 7) = g. We are concerned here with the real variable only. 
If U(z)=(q—Q(z; q))/q(1 —q), the differential equation satisfied by U is 


2U"+U’ =2{1+AU—}(1—A?) U*], where A= 2q-1. (29) 

















Fig. 3 


Hence U tends to 1 —J,(z) as g> 0 and to J,(z)— 1 as g>1. For computational purposes we 
may therefore seek an approximation to U in the immediate neighbourhood of z = 0 in the 


form U = 8(z)—49(1-9) G2), 
where S(z) = SAm—122"/2242 ... (2n)?, 
1 


may be regarded as a rough approximation 
= [L(zJA)-—1]/A when A>0 
= [1—Jj(zJ|A|)}/|A| when A<0 
=}22 when A=0, 
and where @(z) = 2 Con 28"/ 2848 ... (2n)? 
is a series of correction terms whose coefficients C,, are to be determined. Only even powers 
are required, as may be seen either by differentiating (29) an even number of times or by 


considering the function @(z)=U(—z) which is a solution of (29) with the same initial 
conditions. By differentiating 2n + 1 times by Leibnitz’s theorem, making z = 0, and noting 


that U,= = U(z)| = O0if nis odd, we obtain 


6 6 " 2 
(2n + 2) Usa ig = (2n + 1) [ Ava. (1 4) Z (3) U,; Usn—aj |- 








age 








J. G. SKELLAM 209 
mn (28) By substituting ; 
Ay,/2*...(2j)® for U,;/(2j)!, where A,, = {Ai-1—(1—A?)C,,}, 
1”—1 /n\2 

(29) we find Conse = n+FZ Pt (7)1eyAan-a 
Thereby we obtain in succession: 

C,=C,=0, Cg=1, C,=11A/2, Cy = (61A2—16)/2, 

Cig = A(847A2—507)/4, Ci, = (3820A* — 3677A2 + 488)/2, 

Cig = A(176,245A4 — 234,718A2 + 67,857)/8, 

Cig = (2,536, 967A® — 4,317,594A4 + 1,978,563A2 — 162,816)/8. 


Computed values of U/z* are given in Table 1 for equidistant values of g and of z?. With this 
arrangement interpolation can be carried out reasonably safely. 


Table 1. [¢—Q(z; ¢)]/z¢(1—q) 





0-0 0-1 0-2 0-3 0-4 0-5 0-6 0-7 0-8 0-9 1-0 








0-2500 | 0-2500 | 0-2500 | 0-2500 | 0-2500 | 0-2500 | 0-2500 | 0-2500 | 0-2500 | 0-2500 | 0-2500 
0-2348 | 0-2376 | 0-2405 | 0-2435 | 0-2466 | 0-2496 | 0-2528 | 0-2559 | 0-2592 | 0-2626 | 0-2661 
0-2204 | 0-2255 | 0-2309 | 0-2364 | 0-2423 | 0-2483 | 0-2546 | 0-2613 | 0-2682 | 0-2754 | 0-2831 
0-2068 | 0-2138 | 0-2211 | 0-2289 | 0-2373 | 0-2461 | 0-2556 | 0-2658 | 0-2767 | 0-2884 | 0-3010 
0-1940 | 0-2023 | 0-2113 | 0-2211 | 0-2317 | 0-2432 | 0-2558 | 0-2696 | 0-2847 | 0-3014 | 0-3199 
0-1819 | 0-1913 | 0-2016 | 0-2130 | 0-2255 | 0-2395 | 0-2551 | 0-2725 | 0-2922 | 0-3145 | 0-3399 
0-1705 | 0-1806 | 0-1919 | 0-2046 | 0-2189 | 0-2351 | 0-2535 | 0-2746 | 0-2990 | 0-3275 | 0-3609 
0-1597 | 0-1704 | 0-1824 | 0-1961 | 0-2118 | 0-2300 | 0-2511 | 0-2758 | 0-3052 | 0-3404 | 0-3831 
0-1496 | 0-1605 | 0-1731 | 0-1876 | 0-2044 | 90-2243 | 0-2479 | 0-2761 | 0-3105 | 0-3530 | 0-4065 
0-1400 | 0-1511 | 0-1639 | 0-1790 | 0-1969 | 0-2181 | 0-2439 | 0-2759 | 0-3150 | 0-3654 | 0-4312 


ses we 
in the 


CONIDIA wWN Oo 












































3°8. The stability of stationary states. In the case of Malthusian populations with positive 
parameter (§§3-3 (iii), 3-5 (i)), it was apparent that the stationary state was an unstable one. 
Though we were able to draw valid conclusions with regard to the critical size of a habitat for 
a population on the point of extinction, the model is inadequate in that it does not provide 
a stable stationary state for a population which is safely established. The assumption of 
logistic growth is free from this objection. 

For let Q be a solution of (27) subject to 





Q(z,)=A, Q)=B (0<2z,<%), 
and let ‘Y’(z, t) satisfy (26) subject to 
W(z,,t)=A, W(z,t) = B. 
(The proof for the special case where the conditions at z, = 0 are Q’ = oy = 0 differs only 


dz 
in trivial details and is omitted.) 


Then from our knowledge of the physical situation we should expect > as too. 
Provided that the necessary biological condition holds, that Q(z) >0 in z,<z<z, (though 


not zero everywhere), it can at least be shown that Q describes a stable equilibrium. 
Biometrika 38 


0We;rs 
or by 
initial 
noting 


a 


14 

















210 Random dispersal in theoretical populations 
The function A(z, t) = ‘¥(z, t) — Q(z) satisfies 


10A oA 10A 

Porte. J a tpn. —2Q-A). 30 

y” ot oa +39 + OU me (30) 
In dealing with questions of stability we need only consider such A(z, 0) as are infinitesimal 
variations of 2. So long as A remains small, equation (30) remains effectively 


1cA A 10A 


with the conditions A(z,,t) = A(z%,t) = 0 (O0<t<oo). 
We are therefore led to consider solutions of 
df id 
FT (n+ 1-2Qyf =0 (32) 


which satisfy f(z,) =f(z,) = 0, with Q(z) as given. The Sturm-Liouville theorems are 
immediately applicable. Reduction to the standard form adopted by Titchmarsh (1946) 
may he brought about by the substitution v = f,/z, though in this case there are advantages 
in following Ince (1927) and expressing (32) as 


d 
5, (2f')+2(A +1-20)f = 0. (33) 


The values of A for which solutions are possible will be denoted by A; (i = 0, 1, 2, ...) and the 
corresponding solutions by f,(z), the suffix being the number of zeros between z, and z,. We 
now prove that A, is positive. 

By multiplying (33) by Q, substituting f, for f, and integrating we find 


[20/2 }2— "20! fide + (Aq +1) { “fp ddz—2 | "ofp Q2dz = 0. 
Za 2a 2a 


By subtracting from this a similar relationship obtained in the corresponding manner from 
(27) with Q as the dependent variable, we then have 


Ag } "af Qdz = [ox Q2dz —[Q2fi fe. (34) 
2a Ze 
The sign of f,(z,) = minus sign of f,(z,) = sign of both integrands, provided that Q is positive 


throughout (z,,2,). Hence vd = 
Ao2 | foods | | af Qdz 


and is positive. By Sturm’s oscillation theorems, A, < A, < A,..., so that all characteristic 
numbers are positive. 


From (33) we have fy hi) +1 M+ 1—2Q0)f,f;, = 0 


and a similar relation with i and j interchanged. On integrating (the first term by parts) 
and subtracting, we find 


(A,—A,) [rx fydz = 0. (35) 


The fundamental set of solutions constitute an orthogonal system, and systems of the above 
kind are known to be closed (Ince, 1927). Any arbitrary function such as A(z,0) has the 








(30) 


esimal 


(31) 


(32) 
ns are 
(1946) 
ntages 

(33) 


nd the 
z. We 


r from 


(34) 


ositive 


teristic 


’ parts) 
(35) 


> above 
has the 








J. G. SKELLAM 211 


formal development A = x A,f,(z), where the coefficients are given in the usual Fourier 
manner by 


A; = I ” A(z) fiz) dz | [tera 


26 
If A(z) is integrable over (z,,z,) and of bounded variation in the neighbourhood of z = ¢, it 
may be shown that the Sturm-Liouville development given above converges to 
4[A(C+0)+A(C-—0)] at z=C. 


In problems of the type being discussed at present it is, however, permissible to assume the 
less general condition that A(z) has a continuous second derivative, and in this case the 
series is absolutely and uniformly convergent in (z,,z,) (Ince, 1927). 

It now follows that the solution of (31) is 


A(z, t) = Y A,e~Ai*f,(z). 
i=0 
As t->00, A 0, since all A; > 0. 


3-9. An iterative method. Consider, for example, the equation 


d 
7 (2¥’) + zc,(z) ¥ —ze,(z) ¥? = 0. 
This may be written 


Z aX x 
V(Z) = ¥(b) + OY" (b) log (Z/b) — | =I (zc, ‘¥ —ze,'¥?) dz. (36) 
b b 
The limiting form of this equation as b-> 0 provided that ‘Y’(b) remains finite is 
Z2dX (xX 
0 0 


If approximate values of ‘’(z) are substituted on the right-hand side of (36) or (37) and the 
integration performed numerically, we obtain greatly improved values on the left. It is an 
easy matter to continue the solution step by step obtaining rough values as required by 
simple graphical extrapolation. Because of the remarkable rapidity of convergence it is soon 
found unnecessary to repeat the calculations for the earlier values (which automatically 
reproduce themselves). 

For numerical illustration let b = 0, ¥'’(0) = 0, ‘Y(0) = 0-5, c,(z) = c,(z) = 1, so that the 
equation is zdX (Xx 
0-5-—¥(Z) = =| 2¥(1—W) dz. 

0 0 
The initial rough values are arranged in column (ii) of Table 2. Using intervals of z no smaller 
than 3, and only the elementary integration formulae: 


h 
[flere = Pyhtf(0) + 8f(h) —f(2h)}, 


ath . 
| ee (x) da = sh[f(a—h)+4f(a)+fla+h)], 


we find that a single application of the process reduces the sum of the squares of the errors 
to less than 1/800 its original value. 


14-2 








212 


The method can be applied even if the c’s are variable with finite discontinuities at a finite 
number of points in (b,,5,) provided that the interval of argument employed is not un- 
reasonably large. Given ‘’(b,) and ‘Y’(b,) new values of ‘'(z) can be established progressively 
along (6,,6,). Given ‘Y’(b,) and ‘¥(b,), however, it appears necessary to plot several integral 
curves radiating from the point (5,, ‘Y'(6,)) each with an assumed value of ‘f'’(b,). After each 
trial this value is systematically adjusted until the condition at z = 0, is satisfied. 


Random dispersal in theoretical populations 








Table 2 
(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) 
ai l é Z : 

2 a z¥(1-¥) col. (iii) X col. (iv) x col. (v) 0-5 — (vi) pil 
0-0 0-50 0-006000 0-000000 0-000000 0-000069 0-50000 0-50000 
0-5 0-48 0-12480 0-031333 0-062666 0-015711 0-48429 0-48438 
1-0 0-44 0-24640 0-124267 0-124267 0-062489 0-43751 0-43761 
1-5 0-36 034560 0-274000 0:182667 0-139445 0-36056 0-36060 
2-0 0-26 0-38480 0-459867 0229935 0-243301 0-25670 0-25681 
2-5 0-13 0-28275 0°635258 0-254103 0-365530 0-13447 0-13462 
3-0 0-01 0-02970 0-:717450 0:239150 0-490884 0-00912 0-00924 



































3°10. A more general problem. When the vital coefficients vary from place to place in an 

entirely arbitrary manner the equation we have to consider is 

Ch 

at” 
Orthodox analytical methods appear inadequate, even in one-dimensional or radially 
symmetrical cases. In these cases an easy extension of the argument of §3-8 shows that 
a sufficient but not necessary condition for the stability of solutions is that c,>0 in the 
habitat considered. Because of this, simple methods or numerical solution akin to ‘relaxation’ 
were tried out for such cases, but, because of the ultimate slowness of convergence, were 
abandoned in favour of the method of §3-9. 


fa?V2P + ox(2,4) WY —c,(x, y) V?. (38) 


3-11. Conditions at a common boundary. In the present treatment ‘iV’ is regarded as con- 
tinuous, and the c’s as continuous except at certain points. But this is only a convenient 
abstraction. In nature there are discontinuities almost everywhere (see Denjoy, 1937, p. 8), 
and it might be thought theoretically more desirable to construct our analytical model with 
functions of a more general nature. The scientist, however, is not so much concerned with the 
behaviour of Y and c at a particular point as in their mean value in a small neighbourhood 
of that point. 

Consider the situation at the point z = 6 in a linear habitat where the ‘diffusivity’ is 
uniform throughout but where the vital coefficients in b’ < x < b differ from those in b<a <b". 
(The case where one subinterval is a region of absolute extinction is excluded, and this is 
best treated as a limiting case with c, tending to —0oo.) An appeal to our fundamental 
assumptions will show that unless ¥ is the same at b— 0 and 6+0, the rate of population 
flow across the boundary would be infinitely great, proceeding in such a direction as to restore 
the continuity of Y’. Similarly, unless 0'V'/dz is the same at b— 0 as at b + 0 the rate of change 
of density in an infinitesimally small neighbourhood of b would be infinitely great, proceeding 
in such a way as to equalize the derivatives at b— 0 and b+0. A similar argument holds in 
a radially symmetrical habitat or the more general two-dimensional one. 


a “_s—.) 





ein an 


(38) 


‘adially 
vs that 
in the 
xation’ 
e, were 


as con- 
venient 
1, p. 8), 
lel with 
rith the 
urhood 


rity’ is 
a<b". 

this is 
mental 
ulation 
restore 
change 
eeding 
olds in 











J. G. SKELLAM 213 


When employing the iterative method of §3-9 it is not necessary to define the vital 
coefficients at x = 6 unless b coincides with a tabular value of the argument. In such a case 
it is desirable to define c(b) = }[{c(b+ 0) + c(b—0)]. The continuity of c is tacitly implied in the 
use of formulae of numerical integration. 


4. REPRODUCTIVE CAPACITY AND POPULATION DENSITY 


4-1. A law of population growth. Strictly the logistic law is applicable only to populations 
which are continuous in time. It will be shown that where competition is most marked 
among seedlings distributed at random, a somewhat different law holds for annual plants 
with the analogue of the logistic law as a limiting case. 

The following symbols are used: 

AH is a coefficient denoting the suitability of the habitat as a whole, and is resolvable into 
separate components. For example, H = Wsp/=, where 

W is the number of spots of ground suitable in nature and size to support one plant of the 
species concerned to maturity. Such a spot will be termed a cell. Cells may be isolated or 
adjacent in groups of two or more. In the present simplified treatment, all cells are regarded 
as of the same surface area, namely, one unit. 

= is an arbitrary area constructed to include all the cells. It is assumed that this area is 
insulated in the sense that it receives no seeds from without. 

p denotes the proportion of seeds which actually fall in =. It is assumed that the probability 
distribution of the seeds is even throughout &. 

8 is the probability that a ‘seeded cell’ rears a seedling to maturity. 

I denotes reproductive capacity, the number of fertile seeds produced on the average 
per plant. 

x denotes relative density, defined here as the ratio of actual population density to the 
hypothetical maximum that the habitat could support, were every cell seeded. That is, 
y = N/Ws, where N is the actual population number. 

In the interests of completeness it might be thought desirable to introduce more ecological 
factors than are considered here. In most cases it will be found that the mathematical form 
of the final result is unchanged, particularly when the appropriate modifications are 
incorporated into the definitions. 

The probability distribution of the number of seeds per cell will be Poissonian with 
parameter NI‘p/= = NI'H/Ws = Hx. The proportion of cells with at least one seed is 
then 1 —e~'//x (neglecting minor irregularities due to sampling). By reason of the definition 
of x this expression is the relative density of the resulting population. Using suffixes to 
distinguish between the values of x in successive generations, we then have the law 


Xner = l—enP Axa, (39) 

The ultimate stable value y., satisfies the equation 
log (1—x)+THx = 0. (40) 
Typical growth ‘curves’ are illustrated in Fig. 4. From (40) it will be seen that as y> 1 from 
lower values TH ~log-— +00, so that when x is large a very considerable increase in 


reproductive capacity is required to bring about a perceptible increase in the population. 











214 Random dispersal in theoretical populations 


4-2. Further applications. The relation between the area ‘covered’ by a randomly meving 
insect and the distance traversed by that insect (Nicholson & Bailey, 1935) is closely analogous 
with the relation between area ‘seeded’ and the number of seeds produced by an annual 
plant. In many species of insect the population one year depends on the number of places 
suitable for oviposition encountered by the adult population of the previous year. In the 
simplest cases of this type, it is to be expected that relation (39) will be applicable. 














1-0 . ¥ ¥ 
\ TH=5 
w F VH=2 

a 
8 
rT) 0S . 
> e 
E ; 
i. 4 

66...9.2.— - | l 

0 5 10 15 


Generations 


Fig. 4 


Somewhat similar but more elaborate relationships have been deduced (see Jordan 
& Burrows, 1945; Zinsser & Wilson, 1932) in connexion with the dissemination of infectious 
diseases. 


4-3. Discrete logistic growth. The recurrence relationship (39) also arises in the solution of 


a problem in evolutionary genetics (Skellam, 1948). Provided that [Hy is small we have as 
a good approximation 


=l- an 1 
Xn+1 1 4: iTHy,, (4 ) 
Since x.. = 2(H —1)/['H we may express this approximation as 


Xn+i— Xn = Xn + (Xo ~%a), 


corresponding to the differential equation (7a). The finite difference equation is of the 
Ricatti type (Milne-Thomson, 1933) and has the solution 


fe hansat XoX« seharin Fg (42) 
Xo + (Xx — Xo) (PH)~™ 

This expression is in fact the logistic law for a discrete variable. For writing N,,/N.. = x,/X«> 
c = log TH, and denoting the independent variable by ¢ with its origin at the centre of 
symmetry, we obtain the standard form N(t) = N(~)/(1+e-). 

A simple guide to the values of !'H for which (41) is a fair approximation to (39) is afforded 
by the comparison of the upper asymptotes in the two cases for the same fixed values of ['H, 
as shown in Table 3. 

It will be seen that y,, is an increasing function of // and therefore of W, the number of 
suitable spots in =. It therefore appears, other things being equal, that in the richer habitats 








xX n a 





RTE, Whe 





~— ee 








\eving 
logous 
nnual 
places 
In the 


ordan 
ctious 


bion of 
ave as 


(41) 


of the 


(42) 


Xn Xoo» 
itre of 


forded 
of TH, 


iber of 
bitats 


J. G. SKELLAM 215 





a greater proportion of the available cells are actually utilized. The inference may well be 
extended to perennial populations. 








Table 3 
TH = 1-0258 1-0536 11157 1-277 2-0 
By (40) 0-0500 0-1000 0-2000 0-400 0-797 
By (41) 0-0504 0-1017 0-2074 0-433 1-000 






































4-4. Two species in competition. The classical case of two closely similar logistic species 
competing for the same food has been studied by Gause (1935) and Volterra (1931). The case 
we shall consider here is that of two closely related species of annual plant, S and S’, com- 
peting in the same habitat. In order to investigate the extent to which a disadvantage in 
direct competition may be offset by a superiority in reproductive capacity, we shall assume 
the extreme condition that individual S’ seedlings always fail to establish themselves in 
immediate competition with S seedlings (for example, the S seed may germinate earlier). 
All other ecological factors will be regarded as the same for both populations. 

Under these conditions the S population is not adversely affected by the presence of 8’, 


and we may write Hy +log(1—y) = 0. (43) 
The number of cells unoccupied by S in the seedling stage is then W(1— x), and these are the 
only ones available to S’. 
It follows as in (4-2) that for an equilibrium 
N’ = sW(1—x)[l—e-“?P PF]. (44) 
In order that the unit in terms of which we express the relative population density of S’ is 
independent of x, we shall denote N’/sW by xj. Equation (44) then becomes 


Xo = (1—x)[1—e Pe), (45) 
Hence, from (43) and (45), 


rye = xlog (12°) / xs 10g (1-20. (48) 


If now we allow y,—>0 we find that I’/['>— y/(1—) log (1—x). This will be termed the 
critical value of I’/T’. Unless it is exceeded S’ cannot coexist with S. When S is not dense it 









































Table 4a 
x 0-0+ 0-2 0-4 0-6 0-8 0-9 0-99 
Critical I’/T" 1-0 1-12 1-31 1-64 2-49 3-91 21-5 
Table 46 
XX 
rN 0+ 0-1 0-2 0-3 0-4 0-5 0-6 
0-1 1-055 1-118 1-193 1-283 1-395 1-539 1-738 
0-5 1-443 1-610 1-842 2-20 2-90 ee) — 






































216 Random dispersal in theoretical populations 


is possible for S’ to survive (despite its extreme inferiority in direct competition) with only 
a slightly superior reproductive capacity, but when S is very dense, it is impossible for S’ to 
survive unless its reproductive capacity is many times greater than that of 8. In the case of 
very small populations the ‘safety level’ of I’/I' will be somewhat greater than the critical 
value, because of the danger of random extinction. Table 4a gives the critical values of 
I’/T' for various values of x. Table 46 gives the values of I’/I' required to maintain the S’ 
population at various levels in the two cases x = 0-1 and 0-5. 


4:5. Two species and two habitats. As a simple illustration of the application of these 
relations consider two habitats with coefficients H, = 0-001 and H, = 0-01, the same for both 
species, and let = 400 and I’ = 2000. The seeds of S’, being more numerous, might well 
have smaller food reserves. Whatever the cause it is assumed as before that S’ seedlings do 
not establish themselves in immediate competition with S seedlings. 

Since ['H, = 0-4< 1, 8 cannot survive in the first habitat but does so in the second with 
x = 0-9803. Hence — y/(1— x) log(1—yx) = 12-7>TI’/T = 5. As a result S’ cannot compete 
successfully in the second habitat though in the first it appears safely established with 

’ = 0-797. 

In ecological terms we could say that though the requirements of both species are the same, 
the species with the greater reproductive capacity is better suited to the poorer habitat, and 
the species with the greater ability to establish itself against competition at the seedling 
stage is better adapted to life in the richer habitat. 


5. BIOLOGICAL DISCUSSION 


1, The analytical model developed here assumes that dispersal is effectively at random. 
This is at least approximately true for large numbers of terrestrial plants and animals. The 
behaviour patterns of certain animals may be such, however, as to tend to lead them to more 
favourable conditions. To deal adequately with these cases it will be necessary to introduce 
a further complication into the theoretical model somewhat analogous to gravitational 
attraction. Nevertheless, in most instances the range of an animal’s perception is small 
compared with its powers of dispersal, and even the more intelligent may not discriminate 
between two parts of a habitat differing considerably in their effect on survival. Local 
irregularities in the character of the environment act as stimuli initiating repeated tactic 
displacements, the ultimate cumulative effect of which is scarcely distinguishable from 
a blind randomness. 

2. Ifa population is initially located at a centre, spreads as with a Brownian motion, aud 
undergoes unrestricted growth, the rate of radial expansion of its contours is approximately 
constant (see equations 5, 5a). A similar law applies to the flow of a gene, but here it should 
be noted that y? for a new advantageous mutation will almost always be very small. The 
oft-repeated statement that a new advantageous gene spreads through the population 
replacing the former wild-type allelomorph is probably true (if no more than a reasonable 
period of time is allotted) only in those cases where the powers of dispersal are considerable 
in relation to the area occupied. In assessing the roles played by various processes in bringing 
about speciation, it might be borne in mind that fragmentation of the distribution area is by 
no means a necessary condition for different accumulations of genetic diversity to arise in 
the same species in different parts of a comparatively uniform habitat, and ecological 


adaptation can always occur when the rates of gene flow are insufficient to dissipate the 
genetic advances being made. 














only 
S’ to 
use of 
itical 
es of 
he S’ 


these 
both 
5 well 
gs do 


with 
npete 
with 


same, 
;, and 
dling 


dom. 
. The 
more 
»duce 
ional 
small 
‘inate 
Local 
tactic 
from 


1, aud 
ately 
hould 
. The 
lation 
mable 
arable 
nging 
» is by 
‘ise in 
ogical 
te the 








Sar 





J. G. SKELLAM 217 


3. Just as the area/volume ratio is an important concept in connexion with continuance of 
metabolic processes in small organisms, so is the perimeter/area concept (or some equivalent 
relationship) important in connexion with the survival of a community of mobile individuals. 
Though little is known from the study of field data concerning the laws which connect the 
distribution in space of the density of an annual population with its powers of dispersal, 
rates of growth and the habitat conditions, it is possible to conjecture the nature of this 
relationship in simple cases. The treatment of §3 shows that if an isolated terrestrial habitat 
is less than a certain critical size the population cannot survive. If the habitat is slightly 
greater than this the surface which expresses the density at all points is roughly dome-shaped, 
and for very large habitats this surface has the form of a plateau. 

4. The logistic law of population growth is often compared with that of autocatalysis 
(D’Arcy Thompson, 1942) and the inference made that population size is limited by the 
available food supply. In the present paper it is shown from spatial considerations alone that 
many populations which are discrete in time can be expected to satisfy the logistic law—at 
least as a close approximation. 

5. The problem of the relationship between the reproductive capacity of plants and their 
habitat conditions has been studied very thoroughly by Salisbury (1942), but it is by no 
means certain that the thesis he adopts (pp. 159, 160, 231) is entirely tenable. The analytical 
approach of §§4-3-4-5, though no doubt over-simplified, provides us immediately with 
a possible explanation of the well-known paradox that certain species flourish only in 
unpromising situations, and might be developed and applied with advantage to throw light 
on related problems in this confused and bewildering subject. 


REFERENCES 


BROWNLEE, J. (1911). Proc. R. Soc. Edinb. 31, 262. 

CHANDRASEKHAR, S. (1943). Rev. Mod. Phys. 15, 1-89. 

De Geer, G. (1910). Int. Geol. Congr. Stockholm. 

Dengoy, A. (1937). Théorie des Fonctions de Variables Réelles, 1. Paris: Hermann. 

FisHEr, R. A. (1922). Proc. R. Soc. Edinb. 42, 327. 

FisHEr, R. A. (1930). The Genetical Theory of Natural Selection. Oxford: Clarendon Press. 

FisHer, R. A. (1937). Ann. Hugen., Lond., 7, 355-69. 

GausgE, G. F. (1935). Vérifications experimentales de la théorie mathématique de la lutte pour la vie. Paris: 
Hermann. 

Gopwin, H. (1934). New Phytol. 33. 

Hatpang, J. B. 8. (1932). The Causes of Evolution. Appendix. London: Longmans, Green. 

Incz, E. L. (1927). Ordinary Differential Equations. London: Longmans, Green. 

JAHNKE, E. & Emos, F. (1945). Tables of Functions, 4th ed., p. 205. New York: Dover Publications. 

JEANS, J. (1921). The Dynamical Theory of Gases. Cambridge University Press. 

Jorpan, E. O. & Burrows, W. (1945). Textbook of Bacteriology, 14th ed. Philadelphia: Saunders. 

KENDALL, D. G. (1948). Proc. Camb. Phil. Soc. 44, 591-4. 

Kuuyver, J. C. (1905). Proc. K. Akad. Vet. Amst. 25 Oct., pp. 341-50. 

Kostirzin, V. A. (1939). Mathematical Biology. London: Harrap. 

LEGENDRE, A. M. (1825). Traité des Fonctions Elliptiques,2. Paris. Tables reissued with an Introduction 
by K. Pearson (1934). Cambridge University Press. 

Lorxa, A. J. (1925). Elements of Physical Biology. Baltimore. 

Lorxa, A. J. (1939). Théorie Analytique des Associations Biologiques, Part 1. Paris: Hermann. 

MaxweELL, J. C. (1860). Phil. Mag. [4], 19, 31. 

MitnE-THomson, L. M. (1933). The Calculus of Finite Differences. London: Macmillan. 

Nicuotson, A. J. & Baruey, V. A. (1935). Proc. Zool. Sdc. Lond., pp. 551-98. 

Pearson, K. (1906). Mathematical Contributions to the Theory of Evolution, XV. Drapers’ Company 
Research Memoirs. 

RasHEvsky, N. (1948). Mathematical Biophysics. University of Chicago Press, 











218 Random dispersal in theoretical populations 


RaytEienu, Lorp (1919). Phil. Mag. [6], 37, 321. 

Rerp, C. (1899). The Origin of the British Flora. London: Dulau. 

Sarispury, E. J. (1942). The Reproductive Capacity of Plants. London: Bell. 

SKELLAM, J. G. (1948). Proc. Camb. Phil. Soc. 45, 364. 

Tansey, A. G. (1939). The British Islands and their Vegetation. Cambridge University Press. 

Tuompson, D’Arcy (1942). Growth and Form. Cambridge University Press. 

TrrcumarsH, E. C. (1946). Higenfunction Expansions associated with Second-Order Differential 
Equations. Oxford: Clarendon Press. 

Uxsricu, J. (1930). Die Bisamratte. Dresden: Heinrich. 

Uspensky, J. V. (1937). Introduction to Mathematical Probability. New York: McGraw-Hill. 

Vauigon, G. (1945). Equations Fonctionnelles. Paris: Masson. 

VouTerRA, V. (1931). Legons sur la T'héorie Mathématique de la Lutte pour la Vie. Paris: Gauthier- 
Villars. 

Warson, G. N. (1944). Treatise on Bessel Functions. Cambridge University Press. 

Wart, A. 8. (1919). On the Causes of Failure of Natural Vegetation in British Oakwoods. J. Ecol.7 
(nos. 3 and 4), 173. 

Wariaut, 8. (1931). Genetics, 16, 97-159. 

Wricut, W. B. (1937). The Quaternary Ice Age. London: Macmillan. 

ZiInssER, H. & Witson, E. B. (1932): J. Prev. Med. 6, 497-514. 





















‘ential 


thier- 


icol. 7 











[ 219 ] 


THE FREQUENCY DISTRIBUTION OF THE PRODUCT-MOMENT 
CORRELATION COEFFICIENT IN RANDOM SAMPLES OF 
ANY SIZE DRAWN FROM NON-NORMAL UNIVERSES 


By A. K. GAYEN, Statistical Laboratory, Cambridge 


1. INTRODUCTION 


Considering the population to be specified by the normal bivariate surface, Fisher (1915) 
obtained the exact sampling distribution of r, the sample value of the product-moment 
correlation coefficient p. Experimental investigations were laver made, by various workers, 
to examine how far the observed distributions of r, in samples from non-normal populations 
of known form, compared with the corresponding normal-theory values. E. 8. Pearson (1929), 
in studying some experimental results for samples of 20 and 30 from two considerably non- 
normal populations, of which the parental correlations were respectively p = 0-5346 and 
(0:4626, concluded that ‘the results suggest that the normal bivariate surface can be mutilated 
and distorted to a remarkable degree without affecting the frequency distribution ofr’. Other 
writers, following Pearson, have considered still smaller samples (of 5 to 10 in most cases) 
from populations varying widely in the degree of non-normality and the intensity of parental 
correlation. In some cases the results of their experiments may be interpreted as agreeing 
well with the theoretical normal-theory values but elsewhere such agreement is far from 
satisfactory.* The problem does not appear to have been investigated mathematically 
except in special cases. 

Quensel (1938) derived the frequency density of r in samples from the bivariate Gram- 
Charlier population (alternative form, formula (18) below), but the result, as has been stated 
by him, is applicable only when the parental correlation p is zero. From the derived formulae 
of his study he made some observations as to the nature of the frequency distribution of r, 
in special cases of completely independent variates. 

The present paper is devoted to a theoretical study of the distribution of non-normal r, in 
the general non-null case, i.e. the case in which the parental correlation p exists and is not 
necessarily small. Considering the population to be expressed by the Edgeworth form of the 
bivariate Type A surface (including terms as far as those in A3p, Agg Aq, ..., Agg Ag and A33), 
the mathematical form of the frequency density of r is obtained. If higher semi-invariants 
other than_Ayo,Ag;, Age, Ayg ANA Aggy, and AZ, AggAgy, ---, AggAyg and Aj, of the population are 
negligible, then the derived distribution holds good for any size of sample. On the other hand, 
if the samples are fairly large, the formula will hold asymptotically for any form of parent 
population. Thus it is not unlikely that for samples of moderate size, the distribution obtained 
has quite an extended range of applicability, provided always the higher semi-invariants 
other than those considered are small. Throughout this work the values of p and of the A’s 
of the population have been assumed to be known. 

The possibility of the application of Fisher’s logarithmic transformation of the coefficients 
of correlation, namely ere ae chil Sicthiias se, 


* Cf., in particular, the results in Tables V and VI of Rider (1932). See also Editorial comments in 
that connexion, p. 401, footnote at end of Rider’s paper. 








220 =Frequency distribution of the product-moment correlation coefficient 


to the derived frequency function is discussed. The asymptotic formulae for the moment 
constants of the distribution of z in samples from the Edgeworth population are obtainable. 
It is seen that for samples of moderate size, the distribution of z is still approximately normal 
with a mean and standard deviation depending on the fourth and the square and product of 
the third-order semi-invariants (being respectively the measures of ‘excess’ and ‘skewness’) 
of the population, apart from the size of sample N and the coefficient of correlation p as usual. 
Some practical applications of the formulae are considered. 

In the course of the present work certain errors have been detected in the higher order 
terms of the expansions of the moment coefficients of z, given by Fisher (1921). While these 
terms are not important in most practical applications, they have been used by most of the 
workers on the present problem in examining the closeness of the approximations involved 
in the use of z. It is seen that the correct formulae for what David (1938, p. xxxii) calls 
‘Approximation I’ are still closer to the actual probabilities of r. 


2. JOINT DISTRIBUTION OF THE THREE SECOND-ORDER CENTRAL MOMENTS IN 
SAMPLES FROM THE BIVARIATE EDGEWORTH POPULATION 


Let us consider the parent population to be specified by the bivariate Edgeworth surface* 


Fay) = (1+ 2 4g" (E) (&) \ een (1) 
ae itjes4,6 2 tlj! \Oa) \oy 9 
1 1 
Rise Se EAEE Wah SUS SETS 4 2 
where d(x, y) = on Jl —qerr| x1 —p%) (x? — 2pxy + y )| : (2) 
p = the population value of the correlation coefficient 
fu 
=A, = a = 3 
7 (20/02) Au (3) 
since the variables are in standard measure, and 
Ago = Ago = M39, Agy = Agi = May + (4) 
A go = Ago = H4g—3, Agy = Ag, = Hsi—3P, Age = Age = Ho2—2p?—1,.... (5) 
Ago = 10A3,, As, => 1OAg9Aq, Ay = 4A 59 Ayo + 6Az, ’ Ass a AgoAo3 + 9Ag Axe, econ (6) 


K 
Here the A,; (- Eid = Ky, SINCE Ky9 = Koo = 1) will be termed the semi-invariants of the 


population. Using Quensel’s (1938, pp. 76-80) results, the characteristic function of the 
joint distribution of the three second-order central moments, 





Mo = ce 4 » My, = mena —9) and M2. = at yy (7) 
__X(z) __ Sly) 
where t= * x 9 = WwW” (8) 


* Extending his generalized law of error to two dimensions, Edgeworth considers terms involving 
moments up to the third order only, i+j = 3. The form of approximation (1), where terms in 
Aso» Aap eoey Nes; Ago Asp ones Aus and Ado» Agora wees Abs 
are included, has been considered by Wicksell (1917), who has studied also, among other things, the 


regression of its characteristics. In the present work it will be referred to as the Edgeworth surface or 
population. 





a ae 





ment 
able. 
rmal 
ct of 
ess ’) 
sual. 


order 
these 
f the 


»lved 
calls 


‘face* 
(1) 


(2) 


(3) 


(4) 
(5) 
(6) 


f the 
f the 


(7) 
(8) 


Iving 


3, the 
ce or 





eae 





A. K. GavEN 221 
in random samples of size NV, from the population (1), is found to be 


U (too, t11, tog) = E{exp [ityg Mao + ity, M4, + igs Mop)} 


N-1) 
 hentles: [2 + aac {AgoT¥o + 4Ag1 720711 + 2Age (T2072 + 2791) + 4013702711 + Ang Toa} 


N-1)(N- 
+ aia {02,735 + 6A 0421 72713 + 6A 30412720781 + 3A, (730 Toe + 2797?) 


+ 2AgoAgg Tix + 6Agy Aya(731 + 2729711 Ta) + 3AZo(T 29782 + 27027 ir) 


Biaserepaeicicc (9) 
where i = ,/(—1), and 


D={(1 2a) (1 rat) 4p Se —2p (l- py) (“ (10) 
and Tao = — (P%tog + Pity, + ttog) — 1, 
™ = p— 5+ 7p Pitee+ (1 +p?) ity, + 2pttos), > (11) 





‘SR 544 : . 
Te =D ND (itgo + Pity, + pttyg) — 1. 


As a suitable check to the result (9), the mean and the variance of the distribution of the 
product-moment coefficient m,,, calculated directly from this formula, were found to agree 
with Pepper’s (1929, pp. 233-4) results, namely 








mean m,, = aap, 
and var my, = "SY 1 + (1) aa ( - 2) 0% (12) 
_ aa” (N(1 + 6) + (W1) Ag, 








which he deduced by a different method. 
Now to derive the expression for the joint frequency function of mp, m,, and mo, we shali 
have to evaluate the integral* 


1 . ‘ 
f(Mg9, M41, M2) = all U (tao, t11, tog) EXP [ — tt yg Mag — tty, M11 — ttgg Moe] tag dt, Ata, (13) 


a typical term of which is of the form 


Sisvssvarvs(™M20 M415 Mog) = aap] |[P- HN-D~* (ityg)?1(ity3)?8(itgg)”* 


—@ 


X OXP [ — ttgg Moo — ity, M41 — itgg Moq| Utag At, Ato. (14) 
for v varying from 0 to 3 and vy, +»,+, <3. 
The value of (14) for vy, = v, = v, = 0, i.e. the integral 


F;0,0,0 (eq, 415 Mog) = aax|| D-NY exp [ — tag Mag — ity, M44 — itgg Mog] Atay At, Atyg (15) 


* f is defined as zero except for 1g > 0, 149 1M%og > Mii. 











222 Frequency distribution of the product-moment correlation coefficient 


will be found from Romanovsky’s (1925) formula (111), as 
N? \#N-D+ (agg Mog — Mj, )H OH” 
SAM» Mox) = Fe al) (WV + 2r—3)! 
N 
x exp [ - 2(1- 2(1 —p?) (Moo — 2pm, + ma) | . (16)* 


Accordingly, whatever may be the value of v,, v, and v3, the integral in (14) may be 
evaluated by successive differentiation of (16), with respect to mo, 4, and mp9. Thus we have 


Fi svisve,vs(Me2os 11, M2) = (— 1ysirin(_)" (--)" (--)" J, (Mao, 44, Mq). (17) 


The formulae (102)-(104) of Quensel, giving the forms of a typical term in the joint 
distribution of m9, m,, and mo, in samples from the Charlier Type A population of which 
the correlation p is not necessarily small, involve functions of the type (17). But excepting 
in the special case p = 0, they have not been evaluated in explicit form. Quensel derives the 
joint distribution of myo, m,, and mop, and the frequency density of r, for the alternative form 
of expansion of the Type A population, namely, 





pues | + ge (- DH /(a)t/a\9 
Pay=\l+ Bd Aaa (=) (=) | $4, (18) 
where $2) = Taye and ¢(y) = “ra; ew, (19) 


a form applicable only when p is small. Theadvantage of such a form of frequency surface, (18), 
is that the characteristic function of the joint distribution of the three second-order central 
moments comes out to be very simple, for the terms containing p do not appear in it. As may 
be noticed, the expressions (10) and (11) occurring in the characteristic function (9), are 
considerably reduced in form when p is zero. 
In the present case, as we need to consider only values of v,, v, and v, up to vy, + ¥, + V3 < 3, the 
complete expressions for (17) can be deduced for any p without much difficulty. Substituting 
them in (13), and after some simplification, if we write 
(1 — p*) May = Moo — 2pm, + p*Mo9, 
(1 —p?) My, = pmgg — (1 + p®) my + pms, (20) 
(1 —p*) Mog = p?iMtgq — 2pm, + Mop, 

then the joint distribution of m9, m,,, and mp. will be found as 

F (m9, M11; Mo2) 


= fol, Mey) | 1+ beh 


8N(N+1)(1—p 

— 4A5[N? My My, — N(N +1) (pM + My) +(N +1)(N —1)p] 

+ 2Aga[N?( Mop Mo, + 2M},) — N(N + 1) (Mao + 49M, + Mog) + (N + 1)(N — 1) (1 + 2p%)] 
— 4Ay9[ N*M, My, — N(N + 1) (op Mog + My) +(N +1)(N—1)p] 

+Ag[N?27M2, — 2N(N +1) My +(N +1)(N-1)}} 


* Quensel’s formulae (100)-(101), giving the expression for the function f,(1m99, m1, M93), involve 
a printer’s error, as the correct form of C, should be (in his notation) 


( N? ) (N—3)! 
*\1—r2) (N+2n—3)! 





aya Auol N*M — 2N(N + 1) May + (N +1) (N -1)] 


(N—3)! 


; j N \™ 
sc wrscoata 0(;~3) (¥ +2n—3)! 








yolve 














A. K. GAYEN 223 


(N — 2) 
+ ToN(W +1) (NV +3) (1—p%) 





A3,[N3M3, —3N2(N +3) M2,+3N(N +3)(N +1) Moy 
—(N+3)(N+1)(N-1)] 
— 6A59Az;[N3M3, M,, — N2(N +3) Mao( Moo + 2M,;) 
+ N(N +3) (N +1) (2oMg) + M,,)—(N + 3)(N+1)(N—-1)p] 
+ BAgoAyo[ N32 Moy M2, — N2(N +3) Mj,(20Mgy +-M,,) + N(N + 3)(N +1) p(Mgy + 2M,,) 


2 
_(N +3) (N41) (1) p*— ar (map tgg— mh) (N Map — (WN +3))] 


+ BAZ, [N3Myq( Moo Moz + 2M3,) — N2(N + 3){( Moo + 2Moy) Mog + 2(20Moq + My,)My3} 
+ N(N +3) (N +1) {2(1 +p?) Moy + 49M, + Mg} 


D 2 
_(N+3)(N+1)(N—1)(1 + 2p8) + a (mao Mag — mh) (WM yy — (N+ 3))] 


— 2Agp Ags |N3M3, — 3N2(N + 3) pM}, +3N(N +3)(N +1)p2M,, 


2 
(N —2) 
— 6Ag, Ayo[N 2M, (M3, + 2M Moy) — N2(N + 3) {2Myq( Mo + My,) + Myy(2Mog + 39M,,)} 


—(N +3)(N + 1)(N—1)p?- (1Mg9 Mo — m39,) (NM, —(N +3) p)] 


+ N(N +3) (N +1) {20( Myo + Mog) + (2 + 3p?) M,3} 
N? 
—(N+3)(N +1) (N~1)(2+p*) p+ rap —y (man an— Mh) (Wy —(N +3) )] 


+ 3A2,[N3Moo( Moo Mo. + 2M3,) — N2(N + 3) {( Mog + 2Mo) Moz + 2(20- Moo + Mj,) Mis} 
+ N(N +3) (N + 1) {2(1 +p?) Mo. + 40M), + Moo} 


—(N+3)(N+4+1)(N—1) (1+ 2p?) tay (Mop Moq — m?,) (N My, — (N + 3))] 
+ 6Agg Ag, [N2Mo. M3, — N2(N + 3) M,,(29Mo. + M,,) + N(N +3) (N +1) p(pMog + 2Mj;) 
— (N+ 3) (+1) (N= 1) 8a (map maa mi) (N Mg (N+ 3) 
~ 6Agg Ayal N® M3, My, — N%N +3) Moo( Mogg + 2M,1) 
+ N(N +3) (N +1) (20M + M,,) —(N + 3)(N +1)(N—1)p] 
+ A3,[ N3M3, —3N2(N + 3) M2, + 3N(N + 3)(N +1) My. —(N +3) (N+1)(N- 1} | , (21) 


where 








1 ( N2 \#5-)(m..m,. — m2, )HN-® N 
fo(M20, 11, Moe) = an (; =a) (ae rv zz. . exp [ - 21 —p?) (1g9 — 2pm, + mes) 


(22) 
is the normal-theory frequency function of mop, m,, and mop. 











224 Frequency distribution of the product-moment correlation coefficient 


3. THE DISTRIBUTION OF THE PRODUCT-MOMENT CORRELATION COEFFICIENT 


In the joint distribution (21) of the variables m9, m,, and mo let us introduce the new 
variable r, the product-moment correlation coefficient of the sample, by the substitution 


My, = 7 [(MoqMoq). (23) 
Next, as usual, putting m= we', m.=we+ and r=r, (24) 
i , a " dt 
integrating for w and writing Iw = f ami (25) 


we obtain the frequency density of r, in the form 


fr) = 2? — payer (1 —19)10Y—9G yy + (04g) Ley — Pan) Fo + (a) — (oa) Teva} 
+ {(YG0) Lov—» — 7(¥ 51) Lenn + (7(Va2) — (Veg) Levan — (77(¥15) — 706) In ia}], (26) 


where (v,;) are expressions involving, besides N and p, the fourth and the square and product 
of the third order semi-invariants of the population, for 1+7 = 4 and 6 respectively. 

The result (26) undergoes further simplification when the recurrence relations (Co-operative 
Study (1917) formula xxxviii) namely, 


(N —1) [y_yt (2N —1) pry = N(1—p*r*) Lyin, (27) 
between the integrals of the form (25), are used. Also putting 


Yr) = A=? (1 — pyr — 12) ey_y (28) 


and differentiating successively with respect to p, it will be seen that 


(NW —2)(N —1) 
7 


~1)p(r,p)+(I -P Wn) ‘ (1—p%)KYHD(1 —72)MVOpJy, (29) 


(W 1) (Np*+1) Wrsp) +201) pp Wr.) +(1—p PG Wrp) 


= B=) 1) NL —pyKV9(1—1A)H-H Ay, (80) 


and shore nee HaT —naPNe-AbYee 
2 

+ 30-1) (1p hr,p) + (1p rp) 

= A=) 1) NW +1) (1 p91 y (31) 


Using (27) in (26), and then applying the results of (28)-(31), we obtain the required 
frequency density of r as 


lei cnctand (np) +Laseavtrp)} 


+ | he 





i2N(N + SOF 3) {Laz 195 (7 P) +L, Bae »p)+L¢, pail ph, (32) 














6) 
ct 


Te 


7) 


8) 


9) 


0) 


2) 











A. K. GAYEN 225 
where y(r,p), given in (28), is the normal theory frequency function of r, in samples of 
size N. Further = 1, , = 3p(Ago+ Aga) — 4(Asi + Ais) + 2PAaes ! (33) 

Lg, 2 = P?(Ago + Agy) — 4P(Agi + Ars) + 2(2 +9”) Ase, 


these being expressions involving p and the fourth order semi-invariants of the population, 
while 


6 
Le, = — Ldp(aj o t ARs) — (1+ 55 3) (Ag, + Xie) — Fa pA0%0s 
1 
. 6(2+5—5 3) Aa Ais t+ 18(AsoAa + AoaA a) +9 (Aso Aye + AgsAz1); 
= 18 
Le,2 = = 9p¥(ABy + Abs) 3 (44+ 5p ee (ad +28 2)— W_9 p 5 As003 


2—5p 
+189(2 +575 5) AaAaa + 309 (a Joy + AgsAi2) — 6(2+ 57 F) (aodae + Ao Azi)>$ (34) 








—p2 = 
Las = —PMABa + Ads)— 8p (2+ 2 “= F)) (ag, at) +2(1+ “= 29) agodos 








1-9? ‘ 
+6 (1 + 2p? a Aoi Are + 6P?(Ago Aas + Ags Ar2) 


N 


~t 
— 6p (1 +y 5) (AgoAr2 + Ags Azi)s 
these being expressions involving N, p and the square and product.of the third-order semi- 
invariants of the population. 

Of the two subscripts in each L coefficient, the first one refers to the order of the semi- 
invariants occurring in it and the second to that of the derivative of y(r,p), to which it is 
attached. 

As a check to (32), it will be seen that 


1 1 
ba f(r) dr =|, ¥(r,p)dr = 1, (35) 


since I) Y(r,p)dr = (5) i W(r,p) )ar | = 0. (36) 


Now that the frequency distribution of non-normal r is found, as in (32), in terms of the 
derivatives with respect to p of the normal-theory frequency function of r, the moments of 
any order of the former may be obtained without difficulty. For 


[7 Q) vend = (2) [fi pveear| - (=) ain, (37) 


where 4;(r) is the kth moment of the normal-theory distribution of r, and is already known 
(Co-operative Study (1917) and Romanovsky (1925)). 
Up to N-* the mean and variance of r, calculated directly from (32), are found to be 





meanr = p— afte - Pp?) — (Bp(Ago + Agog) — $(Agi + Ars) + FPAz2)} 


— a Bot —p2) (1+ 3%) 
£8A(1 — p?) (Ago + Aga) + £(99? — 5) (Agi + Ags) — $P(9P? + 7) Ags 
+ $(Ago + Ais) + (Ads + Aja) — 3(AsoAer + AosArz)—AarArg}, (38) 
Biometrika 38 15 











226 Frequency distribution of the product-moment correlation coefficient 


and 


varr = will — p®)? + 3 p*(Ago + Aga) — P(Agi + Arg) + $(2 +?) Aga} 


+ pa(b —p%) (2+ 1p? 
— (4p?(11 — 15p*) (Ago + Agg) — (7 — Lp?) (Ag, + Ags) + $(10— 1 1p? — 11p4) Ags) 
— (§p7(Ago + Ags) + $(4 + 5p?) (AG, + Ag) — 5(Ago Agr + Ags Arz) 
+ 2(Ago Aye + AogAgy) — 6PAgAr2)}- (39) 
It will be seen that to N-! they are in agreement with the known results. 
From a different approach, Tschuprow (1925, pp. 114-15, English Translation), reached 
an asymptotic expression, correct to N-?, for the mean value of the sampling distribution 


of r, which, as has been stated, holds good for any law of dependence. The formula involves 
the parameters Mis 


wd 88 
of the parent population, up toi +j = 6 (Tschuprow’s notation for q;; was r,;, but the former 
has been used here to avoid confusion; also his r,,, is our p). 


As a very useful check the result obtained from his formula by the substitution of 
Iso = Azo dao = Ago + 3, | 
Ya = Aa Ys = Ag, + 3p, 





Viz = (40) 


(41) 
Joe = Age + 2p? + | 
and Yeo = LOAZ + 15Agy + 15, 
951 = LOAg Ag, + 10As, + 5pAgo + 15p, 
Yaa = 4Ag9A zo + BAZ, + Age + 8PAz, + Ago + 3(1 + 4p?), , (42) 


Yaz = AgoAgs + 9A Aye + B(Agi + Ays) + 9PAge + 3p(3 + 2p?), 





(which are the values of the above g-parameters, equation (40), for the Edgeworth popula- 


tion (1)), was found to agree with our expression (38), for mean r, up to the same order of 
approximation. 


As a special case, for p = 0, the frequency function of r, equation (32), is found to be 


folr) = veoearorentLint vere) thal wee) 


~2) ee \ 
+ ames a lisa] ap¥ re) thal gavir] +a gave]. 





(43) 
where i= retin ly 2 = Acg, (44) 
3(2N —3 
and = (Ago Agi + AggAi2) — Wap 20A0 + — 5) shed IN mre 
N-3 N- 
b2=-65-5 =o (Aaa + Ae) — 6x5 = (AsoAs 2 tAggAz:), > (45) 


N+1 N-3 
le, 3% V- Wag 20403 + Sagar: 

















nae ae @2@ = © Ww 


2) 


of 


ee el 
- 


4) 


5) 











A. K. GayEen 227 
On simplification, the result (43) has been put in Quensel’s form (his formula (125), p. 100), 











— felt) = get) — Bye ox(7) + 3, Ba osoule)— 2 Baan), (46) 

where 64°) = Bay tO (47) 
"= aye aby rari ){- (Asi + Ars) + Wai ihre + Wa 3 aoe + Avo Aas) +755 + aeinru)}, 
48 

By = [ee reg (Andee + oon + Fea a a a 

wt ea tse GTA ate Pat 


The E-coefficients, formulae (48)-(50), will be found to agree with the corresponding results 
deduced from Quensel’s expression (126), by putting p or A,, = 0. 

The simplicity of the frequency function of r obtained in (32) suggests that it might have 
been reached, otherwise, by fewer steps, had it been possible te express the joint distribution 
Of mMop,M1, ANG Mg, Or its characteristic function or the population itself, in terms of the 
derivatives with respect to p of the corresponding normal theory functions. But attempts on 
these lines have not been successful. Obviously, the population cannot be expressed in terms 
of the derivatives with respect to p of the normal bivariate surface. It will be noticed, how- 
ever, that the use of recurrence relation (27) has contributed a good deal towards a finally 
compact form, as in (32), of the frequency function of r. Neither in the case of the joint 
distribution of the quadratic moments nor in the case of its characteristic function has it 
been possible to make use of similar recurrence relations. 


4, THE ASYMPTOTIC PROPERTIES OF THE DERIVED DISTRIBUTION AND THE 
CORRECTIVE FUNCTIONS FOR POPULATION EXCESS AND SKEWNESS 


It will be noted that the distribution of r, obtained in (32), holds good for any size of sample 
from the population (1), if the values of the higher A’s of the population, other than 
Agos Aga, «++» Agg ANA AZ, AgoAg;, ---, AZ, are negligible. Tschuprow’s formulae for the moments 
of r indicate that the higher population parameters can only occur in terms in N~-? (or in 
higher negative powers of NV). So that, whatever may be the population form, provided the 
samples are fairly large, the same formula (32) will be sufficiently accurate for the dis- 
tribution of r. Thus it is not unlikely that the derived law has quite an extended range of 
applicability, in moderate size of samples, provided always the fifth and higher order 
semi-invariants of the population are small. 

As is seen, the expression for f(r), (32), consists of three components. The first being the 
normal-theory frequency function of r, the other two may be called the corrective functions 
due to population excess (as measured by the fourth-order semi-invariants) and skewness 
(as measured by the square and product of the third-order semi-invariants) respectively. 

(N —1) 
" wore ae Ps 


v., a(t, p) = aNGW aM a vr, p); 


Let us write Yuilr,p) = 
(51) 


15-2 











228 Frequency distribution of the product-moment correlation coefficient 


which, being the coefficients of L, , and L, , in (32), will be known as corrective factors due 
to their population values since, when the population is normal, these L-coefficients vanish. 
Similarly, 














(N — 2) 
N-2 at 
(N — 2) 
¥oalrsP) = PUREST EST aad Ps 


will be known as the corrective factors due to L, ,, Ig , and L, , of the population respectively. 

The values of the ordinates of the corrective factors (51) and (52) are given in Table 1, for 
N = 11, p = 0-0, and for N = 11 and 21, p = 0-8. David’s (1938) tables of the correlation 
coefficient have been used for the purpose of the tabulation work. The data of Table 1 below 
have been given at some length because they provide a point from which one can start to 
explore the distribution of r for different populations. The selection of two illustrative values 
of N and p is, as such, useful. 

It is seen from the results of Table 1 that in the null case, i.e. when p = 0, the effect of the 
parent non-normality is not serious, even for values of N as low as 11. For the case p = 0:8, 
on the other hand, the effects are not negligible. In particular, the corrective factors yf, (7, p) 
and W,(r,e) seem always to have a considerable part in distorting the normal-theory 
frequency curve of r, even when the sample size is gradually increased. 

However, the magnitude of the effective corrections is also dependent on the population 
values of the L-coefficients, (33) and (34). Investigations into possible limits of these, for 
given values of p, have not produced any very useful results. It is worth noting that two of 
the L-coefficients, namely I, , and J, ,, occur in the well-known first approximations to the 


mean and variance of r, for any population, as corrective terms to their corresponding 
normal-theory values, thus 


mean rp +— (3 (Ege +s) a(t 4 te) 4 9 te | 
BN \us0 Hie uFeo 11/02 Feofo2 


1 
= Pt va — $p(1—p*) + $l, i} (53) 
2 
and varr= aye Moa _ 4 (Pa_+ fi) +2 (fe +t)! 
Hin Hea ule Pil goon = Mi 
1 
= Will —p*)? + Hy 3}. (54) 


Accordingly, the mean is affected by the population value of L, , and the variance by that 
of L, ,. Also, since lim (N varr) > 0, 
N->o@ 


[,.> —4(1—p*)?, (55) 


an inequality which sets a lower limit to L, , for a given value of the population correlation 
coefficient p. 


eo er ee 


— 




















ue 


53) 





— 











A. K. GAyEN 


Table 1. Corrective functions for the frequency distribution ef the correlation coefficient 


r in samples from non-normal populations 





Case: p = 0-0, N = 11. The functions give ordinates multiplied by 1000 














Normal-theory Corrective functions Corrective functions 
ae frequency fi Seo ie 
Saison: or excess or skewness 

y(r, 0) Yar; 0) We o(7, 0) Yo.1(7,0) Vo,2(7, 0) Vo,9(7, 0) 
— 1-00 0-00 0-00 0-00 0-00 0-00 0-00 
— 0-95 0-34 — 0-03 0-28 0-00 0-01 — 0-07 
— 0-90 3-48 — 0-28 2-32 —0-01 0-10 — 0-84 
— 0-85 13-10 — 1-00 7-72 — 0-04 0-33 — 2-47 
— 0-80 32-59 — 2-35 16-66 —0-10 0-71 — 4-76 
— 0-75 64-48 — 4-36 28-24 —0-19 1-21 — 7:10 
— 0-70 110-28 — 6-95 40-73 — 0-30, 1-75 — 8-73 
— 0-65 170-38 — 9-98 52-04 — 0-43 2-23 — 9-03 
— 0-60 244-13 — 13-20 60-11 — 0-57 2-58 — 7:67 
— 0-55 329-91 — 16-35 63-26 — 0-70 2-71 — 4-62 
— 0:50 425-31 — 19-16 60-41 — 0-82 2-59 — 0-20 
— 0-45 527-29 — 21-38 51-18 — 0-92 2-19 5-04 
— 0-40 632-37 — 22-79 35°93 — 0-98 1-54 10-39 
— 0-35 736-81 — 23-23 15-70 — 1-00 0-67 15-11 
— 0:30 836-83 — 22-62 — 17-92 — 0-97 — 0-34 18-52 
— 0°25 928-73 — 20-92 — 32-98 — 0-90 —1-41 20-11 
— 0-20 1009-12 — 18-18 — 57°34 —0-7§ — 2-46 19-60 
— 0-15 1074-98 — 14-53 — 78-89 — 0-62 — 3-38 16-98 
—0-10 1123-87 — 10-12 — 95°78 -- 0-43 —411 12-49 
— 0-05 1153-95 — 56-20 — 106-54 — 0-22 — 4-57 6-62 
0 1164-10 0-00 — 110-24 0-00 —4:72 0-00 
0:05 1153-95 5-20 — 106-54 0-22 — 4-57 — 6-62 
0-10 1123-87 10-12 — 95-78 0-43 —4-11 — 12-49 
0-15 1074-98 14-53 — 78-89 0-62 — 3-38 — 16-98 
0:20 1009-12 18-18 — 57°34 0-78 — 2-46 — 19-60 
0-25 928-73 20-92 — 32-98 0-90 — 1-41 — 20-11 
0-30 836-83 22-62 — 7-92 0-97 — 0°34 — 18-52 
0°35 736-81 23-23 15-70 1-00 0-67 — 15-11 
0-40 632:37 22-79 35-93 0-98 1-54 — 10-39 
0°45 527-29 21-38 51-18 0-92 2-19 — 5-04 
0-50 425-31 19-16 60-4] 0-82 2-59 0:20 
0°55 329-91 16°35 63-26 0-70 2-71 4-62 
0-60 244-13 13-20 60-11 0°57 2-58 7:67 
0-65 170-38 9-98 52-04 0-43 2-23 9-03 
0-70 110-28 6-95 40-73 0:30 1-75 8-73 
0°75 64-48 4:36 28-24 0-19 1-21 7-10 
0-80 32-59 2°35 16-66 0-10 0-71 4:76 
0-85 13-10 1-00 7-72 0-04 0-33 2-47 
0-90 3°48 0-28 2-32 0-01 0-10 0:84 
0-95 0-34 0:03 0-28 0-00 0-01 0-07 
1-00 0-00 0-00 0-00 0-00 0-00 0-00 
































230 Frequency distribution of the product-moment correlation coefficient 


Table 1 (continued) 



































Case: p = 0-8, N = 11. The functions give ordinates multiplied by 1000 
‘ “er oe Corrective functions Corrective functions 
fe ees for excess for skewness 

vir, p) Va,1(7, 2) Vaal? P) | ¥e,1(7, p) Ve.2(7, p) Vo,a(", p) 
— 0°75 0-00 0-00 0-00 0-00 0-00 0-00 
— 0°70 0-01 0-00 0-03 0-00 0-00 —- 0-02 
— 0°65 0-02 — 001 0-13 0-00 0-01 — 0-09 
— 0-60 0-04 — 001 0-18 0-00 0-01 = 0-11 
— 0-55 0-06 —- 001 0-30 0-00 0-01 — 0-21 
— 0-50 0-10 — 0-02 0-50 0-00 0-02 — 0-34 
— 0-45 0-17 — 0-04 0-83 0-00 0-04 = 0-55 
— 0-40 0-27 — 0-06 1-29 0-00 0-06 _ 0-82 
— 0°35 0-43 — 010 1-99 0-00 0-09 _ 1-24 
— 0-30 0-65 — O15 2-93 — 0-01 0-13 = 1-77 
— 0-25 0-99 — 0-23 4-31 —0-01 0-18 _ 2-52 
—0-20 1-49 — 0-34 6-25 —0-01 0-27 — 3°50 
—0-15 2-21 — 0-49 8-92 — 0-02 0-38 - 4-77 
0-10 3-27 — 0-72 12-62. — 0-03 0-54 - 6-39 
— 0-05 4:80 — 1-03 17-63 — 0-04 0-76 — 8-39 
0-00 7-04 — 1-48 24-49 — 0-06 1:05 — 10°81 
0-05 10-29 — 212 33-66 — 0-09 1-44 — 13-59 
0-10 15-02 — 301 45-87 —0-13 1-97 — 16-61 
0-15 21-93 — 4-28 61-91 —0-18 2-65 — 19-47 
0-20 32-04 — 6-06 82-62 — 0-26 3-54 — 21-50 
0-25 46-90 — 8-55 108-79 — 0-37 4-66 — 21-45 
0-30 68-83 — 12-04 140-78 — 0-52 6-33 — 17:22 
0°35 101-35 — 16°89 177-98 —0-72 7-63 = 5°53 
0-40 149-79 — 23-59 217-38 —1-01 9-32 18-62 
0-45 222-30 — 32-71 251-63 — 1-40 10-78 62-01 
0-50 331-21 — 44-85 265-07 — 1-92 11-36 132-32 
0-55 495-09 — 60-41 228-01 — 2-59 9-77 234-63 
0-60 741-14 — 78-97 89-30 — 3-38 3°83 361-19 
0-625 906-37 — 88-74 — 41-30 — 3-80 — 41:77 422-36 
0-650 1107-09 — 98-01 — 227-73 — 4-20 — 976 468-29 
0-675 1349-30 — 105-70 — 481-08 — 4-53 — 20-62 481-08 
0-700 1638-62 — 110-15 — 808-55 — 4-72 — 34-65 434-53 
0-725 1979-08 — 108-95 — 1207-69 — 4-67 — 51-76 294-12 
0-750 2370-71 — 98-76 — 1656-60 — 4-23 — 71-00 20-32 
0-775 2805-67 — 75-36 — 2098-42 — 3-23 — 89-93 — 420-67 
0-800 3261-77 — 33-97 — 2422-26 — 1-46 — 103-81 — 1030-28 
0-825 3693-13 29-41 — 2444-64 1-26 — 104-77 — 1740-21 
0-850 4017-99 115-07 — 1916-42 4:93 — 82-13 — 2290-31 
0-875 4108-41 213-92 — 602-72 9-17 — 25-83 — 2275-78 
0-900 3796-61 299-35 1471-74 12-83 63-07 — 1154-60 
0-925 2936-70 322-29 3559-76 13-81 152-56 1126-23 
0-950 1590-43 231-87 3905-79 9-94 167-39 2975-12 
0-975 336-04 63-31 1450-35 2-71 62-16 1618-24 
1-000 0-00 0-00 0-00 0-00 0-00 0-00 

















A EE Sg 

















~ 





1 EE ge rr 




















A. K. GayEN 231 
Table 1 (continued) 
Case: p = 0-8, N = 21. The functions give ordinates multiplied by 1000 
= cokes Corrective functions Corrective functions 
r fi acy for excess for skewness 
unction 

V(r, p) Ya.rilt, p) Vo,2(", P) Yo.1(") Pp) Vo.2(", p) Yos(t, Pp) 
— 0-25 0-00 0-00 0-00 0-00 0-00 0-00 
— 0-20 0-00 0-00 0-00 0-00 0-00 0-00 
— 0-15 0-01 0-00 0-09 0-00 0-00 _ 0-08 
-—0-10 0-01 0-00 0-11 0-00 0-00 = 0-10 
-0°05 0-03 —- 001 0-29 0-00 0-01 a 0-26 
0-00 0-06 —- 001 0-56 0°00 0-01 ~ 0-48 
0-05 0-14 — 0-03 1-24 0-00 0-03 _ 1-03 
0-10 0-29 — 0-07 2-40 0-00 0-06 a 1-90 
0-15 0-63 — 014 4-92 0-00 0-13 — 3-70 
0-20 1-33 — 0-29 9-58 —0-01 0-25 _ 6-72 
0-25 2-82 — 0-59 18-62 — 0:02 0-49 — 12-09 
0-30 5-95 — 118 35-42 — 0-03 0-93 — 20-85 
0-35 12-53 — 2-37 66-05 — 0-06 1-74 — 34-33 
0-40 26-36 — 4-70 119-95 —0-12 3-17 — 62-59 
0-45 55-31 — 9-20 209-75 — 0-24 5:54 — 71-51 
0-50 115-56 — 17-63 347-20 — 0-47 9-16 — 76-26 
0°55 239-50 — 32-77 525-23 — 0-86 13-86 — 29-42 
0-60 489-12 — 58-07 672-94 — 1-53 17-76 136-32 
0-625 692-81 — 75-22 674-81 — 1-99 17-81 287-09 
0-650 972-87 — 94-94 567-99 — 2-51 14-99 485-31 
0-675 1350-45 — 115-65 287-63 — 3-05 7-59 710-72 
0-700 1845-86 — 134-01 — 236-95 — 3-54 — 6-25 908-32 
0:725 2471-22 — 144-14 — 1053-55 — 3-80 — 27-80 972-37 
0-750 3216-90 — 137-12 — 2132-03 — 3-62 — 56-26 749-64 
0-775 4030-04 — 101-83 — 3274-81 — 2-69 — 86-42 81-42 
0-800 4787-20 — 28-65 — 4029-94 — 0-76 — 106-35 — 1059-41 
0-825 5274-55 82-12 — 3710-40 2-17 — 97-91 — 2341-78 
0-850 5209-97 207-44 — 1729-79 5:47 — 45°65 — 2914-05 
0°875 4364-79 293-73 1593-30 7-75 42-05 — 1781-58 
0-900 2814-91 277-84 4281-50 7-33 112-98 821-21 
0-925 1157-18 156-15 3854-73 4-12 101-72 2362-23 
0-950 195-96 34°73 1192-94 0-92 31-48 1124-91 
0-975 3-29 0-75 33-32 0-02 0-88 44-26 
1-000 0-00 0-00 0:00 0-00 0-00 0-00 



































232 Frequency distribution of the product-moment correlation coefficient 


The fact that the expression on the right-hand side of (54) is always positive also follows 
directly from the following consideration. By writing 49 = 07, M9, = 03 and 44, = PO, Ag, it 
will be seen that 


varroan tlle) Can) ela) Cae) Ca) Ca) 
eave (Se, | 


>0, (54) bis 


where £ and are the population means of x and y. 

Obviously, neither the L-coefficients nor the semi-invariants themselves can take arbitrary 
values. For various inequality relations are known to exist between similar order semi- 
invariants of a population. In order to see how large these L-coefficients can be in practice, 
we shall consider a few numerical illustrations from the studies of various workers. With 
a short description of the source of data, Table 2 shows the semi-invariants of a few non- 
normal distributions, arranged in order of magnitude of the correlation coefficient p. As the 
number of observations in all these examples is large, it is not unlikely that the calculated 
semi-invariants are truly representative of their population values. In some of the cases the 
Edgeworth surface (1) may not afford a very satisfactory representation of the population. 
But whatever may be the population form, as the effect of the first power of the fourth-order 
and the square and product of the third-order semi-invariants can be correctly measured by 
the factors (51) and (52), it seems worth while to examine the values of the L-coefficients for 
all these populations. They are shown in the last columns of Table 2. 

The values of the semi-invariants for populations I (no. of trumps in deals of whist) and 
III (weights of placenta and newborn child (boys)) as given by Wicksell, are correct up to 
four figures only. So that in the present, as well as in the subsequent illustrations, the 
calculated values of the constants for these populations may be in error in the last one or two 
figures. 

The population of contemporaneous barometric heights at Southampton and Laudale 
(no. IV in Table 2) might be considered as moderately non-normal. Pearson (1925) fitted his 
fifteen constant bivariate frequency surface (which is the same as the bivariate Gram- 
Charlier surface including terms as far as those in Ago, Ag,, Aga, Ay3 and Ag, only) to the 
correlation table of this distribution. Regarding the goodness of fit, he says, ‘If the 
15-constant surface actually describes the material sampled, we should expect a worse result 
than that observed in one out of five or six trials. Such a result is sufficiently encouraging 
to make it worthwhile investigating the 15-constant surface on further material.’ The 
Edgeworth population (1), on the other hand, as it considers a further set of terms in 
Ady, AgoAg: «++» Agg Aig and AZ, will certainly have higher representational power in such cases 
than the 15-constant surface. In what follows, we shall investigate in some detail, by 
applying our theoretical formulae, the sampling distribution of r for such a parent population. 
The measures of departure from normality being moderate, our formulae should work well 
in this case. 














S€€0-0 


ST¥0-0 


LOZI-0 


8L1Z-0 


O9TT-O 


6049-0 — 


£66F-0 — 


PLITI— 


0020-1 — 


Lg€0-0 


6026:T 


9220-0 


8LLE-I— 


4906-0 — 


3120-0 — 


SZFT-0 | EZEL-0— 


6820-0 | LEFT-O 


FL6Z-0 


OOIT-T 


9928-0 | 9808-0 


831-0 | LLES-0— 


bF99-0 


6761-0 


901T-0— 


8868-0 


9960-0 — 


6ZL°0 9976-0 


061-0 =| 0822-0 


£890-0 — 


8902-0 — 


GEST -0 9241-0 


6100-0 9920-0 


$862°1 6298-1 8007-0 — 


IrEE-0 | 0219-0 6ELF-0 


0800-0 — | 9982-0 b8F-0 


S6E1-0 8168-0 Igh-0 


420-0 €990-0— | LE8T-0 


209-0 — | €€99-0— 


POTE-O | OL83-0 


1490-0 


909T-0 


v69T-0 Tést-0 


4290-0 — | ¥190-:0— 


9016-0— 


LEI¥-0 


0ZET-0— 


80SF-0 


9381-0 


TI8L-0 


808L°0 


2949-0 


L08¢-0 


c0te-0— 


(uesuByor 10438 
“LI6I ‘Tes4orM) 
0FF6—suBEq Jo 
yapeelq pus yZueT 


(¢Z61 ‘uosrveq) 
e[epney puv u04 
~dureyynog 3762— 
874Z1ey o1euIOTeq 
snoouviodure,u0g 


(L161 ‘Tresyo1M) 

€%31—(840q) 

‘equeoRld pus pio 
WIOGMOU JO SIYSIOM 


(0861 ‘sntz03;01g) 

_(ae04 Ty) o¢8‘sz 

—sep oj8u10}/8 UO 
siyFoy o1yemOIEg 


(FI6I 

‘st[Jessy-uos188q) 

000‘¢z—spusy om 

UT 4sTYM JO sTBep 
ur sdumy jo ‘ony 





s yf 





8'orr 





Torr 


bide § Lala 





Wy 


sty By 








By Wy My 











sty By 








my 








Il =N a4 
(¥g) uoryenbe woz 
SPUSTOYJOOD GAIJOOLIO|D 





(gg) uoryenbe 


Jepso yO 





Jopso PY, 





WIOIJ SYUSTOYJOoo 
@ATIOOLION 





SJUBIIVAUT-TUIes UOTZBNdog 





d jo 
eny[sa 
u0ry 
~ondog 





%838p Jo eomos 
pus suoryeA.s10esqo 
jo “ou ‘uondiioseq 





squaroyfaoo-r7 ay) fo sanzoa burpuodsasi09 ayy pun suoyngrsysep pousou-uou maf v fo ssaqouvsod uoynjndod ay,J, °Z 98], 















































234 Frequency distribution of the product-moment correlation coefficient 
Illustration a 
For the frequency distribution IV of Table 2, Karl Pearson has given the following values f 
of g,; (see expression (40) above), fori+j = 3 and 4: ; 
P = WM = 9°780,819 
Y39 = 0°413,691 Yao = 3°612,028 T 
Yo, = 0°286,962 Yq, = 2°676,586 
Gog = 0°473,852 3 = 2°532,831 q 
oq = 3°194,947, 
5075 
40+ 
30-4 
20- 
1¢ | 
0' T T T T q 
O25 03 0-4 05 06 07 08 09 10 
Fig. la. The frequency curve of 7 (N = 11, = 0-8) in samples from: (i) a normal population, —; 
(ii) a non-normal population (based on contemporaneous barometric heights distribution), —-——. 
60- 
50- 
404 
30- 
20-4 
10- 
0 T t 1 
025 03 0-4 10 
Fig. 1b. The frequency curve of r (N = 21, p = 0-8) in samples from: (i) a normal population, —; ( 
(ii) @ non-normal population (based on contemporaneous barometric heights distribution), -—-. I 
To cut down the numerical work, we shall take 
P= Fy = 08 
Aso = Y30 = 0-4 Ago = Iao— 3 = 0-6 
As, = a, = 03 As, = 131 — 3p = 0°3 
Aye = Ne = 03 Ase = Y22— 26? -1 = 0-12 
Aos = Yos = 95 Ags = is— 3p = 01 
Aos = Tog — 3 = 02 ] 








alues 





A. K. GayEn 235 


and proceed to investigate the theoretical distribution of r, in samples of N = 11 and N = 21 
for the population specified by the above set of values of p and A’s. The approximations thus 
considered will enable us to utilize our tabulated values of the ordinates of the corrective 
functions for p = 0-8 and N = 11 and 21. 


Table 3. The probability integrals of the frequency distribution of r, derived from formula (32), 
in samples from the non-normal population IV, based on the frequency of contem- 
poraneous barometric heights with the corresponding normal-theory values 














Probability integral for the distribution of r, (p = 0-8) 
N= 33 N=21 
r 
‘ Normal ; Normal 
Population IV population Population IV population 

—0-10 0-00006 0-00040 0-00000 0-00000 
0-00 0-00019 0-00089 0-00000 0-00000 
0-10 0-00056 0-00194 0-00000 0-00002 
0-20 0-00154 0-00419 0-00000 0-00009 
0-30 0-00413 0-00900 0-00000 0-00040 
0-40 0-01085 0-01940 0-00011 0-00177 
0-50 0-02830 0-04223 0-00215 0-00781 
0-60 0-07364 0-09311 0-02052 0-03383 
0-65 0-11807 0-13872 0-04945 0-06911 
0-70 0-18787 0-20658 0-11416 0-13761 
0-75 0-29451 0-30596 0-24439 0-26217 
0-80 0-44876 0-44643 0-46649 0-46323 
0-825 0-54432 0-53348 0-60731 0-58983 
0-850 0-64901 0-63020 0-75058 0-72237 
0-875 0-75666 0-73243 0-87414 0-84377 
0-900 0-85650 0-83226 0-95766 0-93457 
0-925 0-93573 0-91764 0-99449 0-98353 
0-950 0-98254 0-97489 0-99914 0-99848 
0-975 0-99876 0-99776 0-99999 0-99998 
1-000 1-00000 1-00000 1-00000 1-00000 























In order to avoid confusion in the latter part of this work, we shall denote Karl Pearson’s 
(1925) distribution of contemporaneous barometric heights as the ‘Population IV,’ and the 
present approximated distribution based on the actual one as the ‘Population IV’. 

Now the L-coefficients for this population IV are found to be 


Ty, = 09-5120, Ly, = —0-1344 
Le,, = — 9°2053,333 for N = ll, and — 0-2425,263 for N = 21, 
Le. = —0-5216,000 for N = 11, and 0-4668,631 for N = 21, 
Le; = 9-0307,200 for N = 11, and ~ 0-0259,199 for N = 21. 


Comparative values of the probability integral of the distribution of r, for the normal 
population and the non-normal population IV will be seen in Table 3. As the tables for the 











236 Frequency distribution of the product-moment correlation coefficient 


ordinates and the probability integrals of normal-theory r are available, correct to five 
decimal figures only, the last one or two digits of the tabulated results in the non-normal case 
may be in error. Comparative frequency curves of r for the normal and non-normal cases will 
be seen in Figs. la, 6. The population value p = 0-8 has been marked by an ordinate in each 
diagram. 


5. THE MOMENT CONSTANTS AND THE EFFICACY OF THE DISTRIBUTION OF 
z=} log,(1+r)/(1—r), IN SAMPLES FROM A NORMAL POPULATION—A CORRECTION 
TO SOME FORMULAE OF R. A. FISHER 
Suggesting the transformation 
z= tlog. and ¢= tlog.s— 


for the sample correlation coefficient r and its population ton p, Fisher (1921) gave the 
asymptotic expansion* for the frequency function of r, in samples from a normal population, 
in powers of x = z—{ and inverse powers of (V—1), N being the size of sample. The 
expressions for the moment-constants of x about 2 = 0 and x = , as well as the first two 
A-coefficients were also given in that connexion. Unfortunately, an error has led to most of 
these formulae being incorrect in certain terms. The correct expressions are given below. 
Thus we have for the raw moments, 


l+r ie 


(56) 


























ei i p aoe 
ee —p*  88—9p?—94 
sine agh ales oe aN 1)? , (58) 
ig ae 13 + 2p? 
f(x) = XN — pl tarot} (59) 
.p 28-—3p? 736—84p?—5lp* 
bal) = esvit *+3~v-1)* — 16-1)? , wi 
for the central moments 
‘hoe ‘4—p*  22-—6p?-—3p* 
isda 6(N —1)2 o} =) 
fis(2) = oroipt at (62) 
1 14—3p? | 184—48p*— 2 pt 
(2) = (v— araisl?+ Wo It aN 1p +4, (63) 
and for the f-coefficients P 
A) = Gyoapt (64) 
a 2, 4+2p?— 394 
Ax{2) = 3+ a+ (W—1? (65) 


The formulae in their incorrect form have passed into general use, as most of the workers 
on the present problem have used them for comparison of their experimental results and the 
corresponding normal-theory approximation by z-transformation. The corrections affect, 
for example, Table V of E. S. Pearson’s (1929) paper and Table VIII in the Introduction to 

* There is an obvious printer’s error in the third term of the expression within curly brackets (p. 13). 


a 2+p 
It should be ———— q 
sho’ 3-1) D instead of &(n—1) (in Fisher's notation) 











ev 


Ww 
be 


Ir 





ve 
se 
ill 
ch 


6) 








A. K. GayvEn 237 


David’s (1938) Tables of the Distribution of the Correlation Coefficient. In general, the approxi- 
mations based on the corrected formulae show that Fisher’s z-transformation provides 
even more accurate results than were suggested by the comparisons quoted by these authors.* 


6. THE DISTRIBUTION OF FISHER’S z, IN SAMPLES FROM A NON-NORMAL POPULATION 
Let us apply the eee’? (56) to the frequency ees (32). Writing 
=*) N-1 N-2 dt 
x(z, ¢) = ———sech’—! €sech*-2z oni 


which is the normal-theory abides density of z, the transformed frequency function can 
be put in the form 


(66) 


N o? 
fe) = x@.0+ snare Haag 0+ La aeaxte.0)| 


N-2 a) é8 
+ TON + Haves lo% 6.195% x(z, f)+ Las gaXle,) +Lasz5xt2.0)}. (67) 


For the direct evaluation of the moments of z about ¢, from (67), we shall come across the 








integrals 
atk.» = |" OF (ZY tre dae (68) 
which can be expressed in terms of the derivatives with respect to p of the normal-theory 
moments P co 
wile) = [" @-OF x, Ode (69) 
In the present case, for v = 1, 2, and 3, ig integrals (68) will be found to be 
k 
Q(k, 1) = siete) + appt), (70) 
0? 2k “i 2kp k(k—1) 
Q(k, 2) ~ Op a5 Hx (2) Buby eT (l— p?) aplk- _1(%) + (l— (1 — payee _1(% 2) + i papaltk 2(2) (71) 
38 3k 6pk oO ed 3k(k—1) 
and Q(k, 3) = apalte(™) er anes (1- p?) apaitk- 1(& ) Laer a (i= p?)? op _4(%) Tp Fpt-al®) 
2(1+3p%)k , 6pk(k — i) k(k—1)(k—2) , 


“(apa Meal) + “gaya Mie _(z )+——G— pas tj, (2). (72) 


The asymptotic formulae for the first four moments of z about ¢ for the distribution (67) 
may be obtained by using the corresponding normal-theory results, (57)—-(60), in the above 
integrals (70)-(72). The expressions for the mean, the variance and the third and the fourth 
moments about the mean will be found to be 


mean z = b+oy— D fg > a= — p*) (Ago + Aga) — 4(1 +?) (Agy + Aga) + 20(5 +P?) Deal 
+ar 1 1? {bets +p 2) — — 16(1 —pa2l30(5— p*) (Ago t+ Aoa) pm 4(5 iz 8p? —p*) (As + “) 


+ 2p(29 + 8p? — p*) Age] — iain 5 — 10p? + 3p*) (A8y + Az) 

+ 3p(15 + 10p? — pt) (AZ, + AZy) — 4(1 + 3p?) Ago Agg — 12(2 + 9p? + p*) Agy Age 

—6(8 + 6p*—p*) (Aso + Aoodaa)+ 120(8-+p") (Agora + Aone] (73) 
* (The corrections involved to Table VIII of David’s (1938) Tables of the Distribution of the Correlation 


Coefficient have been kindly supplied by Dr Gayen and will be included on an Errata Sheet issued with 
copies of these Tables sold in future. ED.) 











238 Frequency distribution of the product-moment correlation coefficient 





var z= iW 1) ( + (i — al Pet Noa) — 4P(Agy + Ags) + 2(2 +p?) dea} 


+ peal Bt 0 gpa Ph +0) an + Aad) 4002+ 04) an +A) 
+ 2(4-+ 4p + 94) Aaa] — 575 —saye LO" — 9") (Ado + Ab) 
+ (4418p? + pt) (AB, +-A2,) — 4AgoAgg — 120(2 +2) Agr Ars 
— 20(5 + p*) (AgpAgy + Aon Are) + 4(1 + 20%) (Ago Aaa + Assn) 


i ie 1 os 2_ 4 a a) ‘ 
+ mip (te? 6p* — 3p + aa pple (7+ 12? — 5p*) (Ago + Aga) 
— 4p(9 + 8p? — 3p*) (Ag, + Ay) + 2 (20 + 21p? + 4p* — 3p) Ago] 
] 
+ 57 —payalP*(2 — 3p*— 2pt) (Ada + Aga) + (32 + 990% + 15pt— 2p*) (Ab, + Ads) 


— 8p(3 + p?) Ago Ags — 360(5 + 3p?) Agi Ags 
— 2p(35 + 15p* — 2p*) (AggAgy + AggAje) + 4(6 + 17? + p*) (Ago Ais + Ago] » (74) 





1 
wal) = oye eT —payel PM Ada + Ab) — 90C2 + 08) (Ab, + AB) + 2A Aog 


+ 6(1 + 2p?) Agy Ags + 697(Ago Aas + AggAre) — 6P(Ago Are + Ao Aan} 





1 ] 
} = 1) te" ca 4(1 —p?) [3p?(Ago 5 Agu) = 12p7(Ag, + Ays) + 6p({2 +p?) Age] 


+3 ii — pal (13 + 3p%) (Ago + A§s) + 90(10 + 5p* + pt) (AG, + A.) 
4 ra + 9p) Aso Ags — 18(5 + 9p? + 2p) Ags Aye 
— 6p7(13 + 3p?) (Ago Ags + Ags Arg) + 6P(11 + 5p?) (Ago Aye + Aos 0} » (75) 





1 
M,(2) = TW 1 {3+ + x1 —p% [3p?(Ago + Agg) — 12p(Agy + Ag) + 6(2 + p*) deal 
1 1 
% (W—1) 4 a = 1 — pay Aeo + Agq) — 3693(Ag, + Ayg) + 1897(2 + p?) Agg] 
+ fi = paleo — p*) (Ag + Ags) + 3(4 + 1 7p? + 3p*) (AZ, + AZ,) — LEPAgg Ags 


— 12p(7 + 5p?) Ag, Aye — 69(5 + A (AgoAg: + Agg Ara) + 12(1 + 3p?) (Ago Ags + Ags Ac1)] 

1 
+a car —aya 184 48" 21p*) + ma si paleX t+ 36p?— 27p4) (Ago + Agy) 

— 4p(25 + 24p? — 21 pt) (Ag; + Aya) + 2(56 + 61p? — 124 — 21p*) Ag] 

1 
+5 x1 —p)2 [3p?(30 + 3p? — 7p*) (AZ, + AZ,) + 9(16 + 54p? + 11 p* — 3p) (Az, + AZ.) 

— 129(9 + 4p*) Ago Ags — 36p(23 + 1 7p? — p*) Agi Ars 

~ 89(60-+ 37p*— Opt) AAs + AgsAia) +12(8-+ 80p*+ p4) Rapin +AesAn)I}, (78) 


fre 


b. 





I 
I 
t 
a 





so -. eos. UU-tlOat 








(74) 


(75) 


(76) 














A. K. GayEen 239 
from which we obtain, omitting terms of negligible order, 
3 1 
+ 6(1 + 2p?) Ags Ags + 697(AzqAo1 + Agg Ar) — 6P(AgoArz + AosAzi)] 


1 3p(2—p? 
+ WI (o* mn ae [0?(Ago + Age) ~ 40(Agi + Aas) + 2(2 + 97) Ags] 





2 Peer [25p%(ASq + Ags) + 90(18 + 7p) (AR, + Af) 
— 2(19 + 6p) Ago Ags — 18(9 + 16?) Agy Aye 





— 150p*(Ago Agi + Agg Ara) + 60(23 + 29?) (Ago Are + Ags An])| ’ (77) 
and 
Pale) = 8+ pp |2— Ga ppyalP n+ Roe) — 400Aar + Aas) + 202+ p*) Aa] 


+ Ty apaypl2PNO— 294) (Ady + Ab) + 6(4-+ 1598+ 204) (AB, + AB) 
— 28PAgqAog — 12p(13 + 8p?) Agi Ars 
—129(5-+ 2p") (Lapa + Aang) + 12(2 + 5p) AgoAaa + oon} 
1 (4+ 5p? — 3p4 
cy mapi|t + 20t— Bot + See PCat ae + Raw) — 40 Aan + Aaa) + 2(2 + 0%) Aga 
1 
* (1 — pays (66 — 35p* + Tp*) (Ag + AGs) 
+ (88 + 292p? — 35p* — 3p®) (Ag, + A2,) — 2:0(49 — 11?) Ago Ags 
— 6p(103 + 17p? — 6p*) Agy Aya — 2p(110 + 7p? — 3p) (Ago Aai + AogArz) 


+ 2(44+91p*— 219) AggAiz+AseAn)]}- (78) 


The corrective terms in the expressions for the moment constants of z, as they contain 
powers of (1 —p?) in the denominator, may appear at first sight not to be valid near p = 1. 
But the values taken by the expressions in the square brackets in the numerators will prevent 
the whole terms from becoming large. For it will be seen that for p = 1, the numerators of 
all corrective terms reduce to one or other of the forms 


const. (Ago — 43; + 6Ag2 — 4Aj3 + Agu), 
and const. (Ago — 3A, + 3Ayg — Ags)*, 
which also are known (Kendall, 1949, result (26)) to be zero, in the case p = 1. It may be 
possible, however, to find limits of the whole terms for given values of p, but investigations 
on these lines have not been fruitful. It will be noticed that the above corrective terms bear 


interesting analogies to those in the expressions for the mean and variance of ¢ (the sample 
value of Kendall’s 7) given by Kendall (1949). 


-f 


7, THE EFFECT OF THE POPULATION EXCESS AND SKEWNESS ON THE DISTRIBUTION OF z 


As bef i " 
s kefore, we shall write Xa1(2, 6) = aN 1) ap Xes 
N-1) @ 
Xe a(2, ¢) = aarp ape ), 


(79) 











240 Frequency distribution of the product-moment correlation coefficient 


for the corrective factors in the distribution of z, (67), due to the population values of the 
constants J, , and LZ, ,; and 














(N —2) a 
Xe,1(2; ¢) ba [2NW +1) (N43) 9% >) 
N-2 o 
Xe, 2(2; ¢) _ Env Neat ¢), (80) 
if (N —2) as 
Xe,3(2, ¢) oe T2N(N +1) (NW +3) qper™ 9)» 


for those due to the values of L, ,, LZ, , and L, 3, which, as has been pointed out, depend on the 
sample size as well. 

The total effect of the population excess and skewness on the distribution of z will be 
studied in some detail by the help of the moment constants deduced in (73)-(78). The 
expressions for /,(z) and £,(z) show that the distribution of z is asymptotically normal, 
although the approach towards normality is not so rapid as in the normal theory case. It 
will be noticed that the expression for /,(z), in (77), begins with a term in (N — 1)-?, whereas 
in the normal-theory case it starts with the next higher order term, namely that in (N —1)-%. 
This effect on /,(z) is due to skewness in the population, as measured by A3p, AgoAqq, ---» Ags Ar2 
and A3,. In both the normal and non-normal cases, however, /,(z) vanishes absolutely 
for p = 0. Considering terms up to O[(N —1)-"], the value of ,(z), (78), is found to be 
influenced by the correlation and also the excess and skewness of the population; whereas 
in the normal-theory case it does not even depend on p. 

These observations, it may be emphasized at this point, apply only for moderately non- 
normal populations. Lower order terms in (N — 1)-!, containing powers of A3y, Asp Aq, ---, ABs; 
Ago, Agi: «++» Ag, have not been taken into account in the above expressions for the f’s. If in 
particular the squares and products of Aj, AsoAqq, ..-, AZ, of a population are not small, the 
value of £,(z) may be considerable as in that case its expression will begin with a term in 
(N —1)-'. Accordingly, however large the sample size is, the formula (77) holds good only in 
cases where population A’s other than those considered are negligibly small. 

For the values of p given for the five populations listed in Table 2, and for N > 21, £,(z) 
is always < 0-00003 in samples from normal populations. Using equations (74) and (75) with 
the A’s of Table 2, it was found for N > 21 that while /,(z) had larger values than on the 
‘normal-theory’ it was still always below 0-002. For y,(z) = £,(z)—3, the comparison is 
summarized in Table 4. 

As the illustrations taken are representative of a fairly wide class of non-normal popula- 
tions, it may be inferred from the tabulated results that the distribution of z is still in general 
closely approximated by the normal curve if the samples are not too small. 

Considering terms up to O[(N—1)-"], the mean and variance of z, (73) and (74), are 
independent of the population skewness, showing that its effect on them is less serious. They 
are mainly influenced by the excess of the population. The mean value of z exceeds the 
population value by about 





2 1 
(N-1) $p+ 8(1 —p%)? [(3 — p?) (Ago + Agg) — 4(1 +9?) (Ag, + Ayg) + 20(5 + p?) eal . 


Compared to }p, the values of the second term of the expression within the curly bracket 
are not small for the populations of Table 2, showing that the bias introduced by 




















A. K. GayEn 241 
of the Table 4. The comparative values of y2(z) = £,(z) —3 of the distribution of z, in samples 








































































from normal and non-normal populations 
Values of y,(z) = £,(z) —3 from formula (78) 
Population Population 
value of p 
(80) N=21 | N=51 | N=101 | N=201| N=401 
I Trumps — 03305 0-128 0-050 0-025 0-012 0-006 
II Full year barometric 0-5807 0-043 0-011 0-004 9-002 0-001 
heights 
on the III Newborn child 0-6455 0-282 0-118 0-060 0-030 0-015 
IV, Contemporaneous 0-7808 0-130 0-036 0-015 0-007 0-603 
: barometric heights* 
vill be Vv Beans 0-7811 0-362 0-099 0-042 0-019 0-009 
). The 
ormal, Normal population with the values O-lllor| 0-042 0-020 0-010 0-005 
se. It | of p as above 0-110 
hereas | 
~1)-3, * Note that the figures in this row are for the original population IV, (see p. 235 above). 
AosArz 164 
lutely 1-44 
to be 124 
hereas 1-0-4 
y non ~ 
0-6 - 
oop Ady; 
. If in O45 
ll, the 02- 
erm in 0 T T T <== — T 
nly in 0 0:5 1:0 , es 2-0 25 
Fig. 2a. Comparative frequency curves for z = 3 log, ee (N = 11, p = 0-8) in samples from: 
7) A, (2) (i) @ normal population, —; (ii) a non-normal population (based on contemporaneous 
») with barometric heights distribution), -—-. 
on the 
ison is 
16-4 
ope 1:4- 
eneral 
1-2-4 
4), are 1-0 
. They 08- 
ds the I 0-6 “| 
0-44 
0-2 - 
0 i— — 3 T U u 
0 05 1:0 15 2:0 25 
ee Fig. 2b. Comparative frequency curves for z = } log, i, (N = 21, p = 0-8) in samples from: 
y (i) @ normal population, —; (ii) a non-normal population (based on contemporaneous 
barometric heights distribution), -—~. 
Biometrika 38 16 











242 


Frequency distribution of the product-moment correlation coefficient 


non-normality may be considerable. Using the full expressions (73) and (74) the values of 
mean z—¢, and standard deviation of z, (o,), have been calculated for a few representative 
values of N for each of the above populations. They are shown in Tables 5 and 6 against their 
corresponding normal-theory values. Differences betwecn the mean values and the ratios 
of the standard deviations are also tabled. It is worth noting that with increasing sample 
size, the ratio of standard deviations hardly ever tends to unity. This shows that the variance 
of z is very sensitive to changes in the population form. In most cases of large samples the 
actual variance is found to be widely divergent from its normal theory value. The magnitude 
of the differences in mean z, however, diminishes gradually (although not very rapidly) as 
the sample size increases. 


FT SE 


Table 5. The comparative values of (mean z —€) of the distribution of 2, 































































































ji 


St 


a i | 


ee 





N=21 N=6l 
Popula- 
Population tion value . (i) - (i) re 
of p Non- aie Difference Non- Be Difference 
normal samples | (i) —(ii)| | normal es |(i) — (ii) 
samples samples P 
I Trum: —0-3305 | —0-01089 | —0-00774 0-00315 —0-00430 | —0-00322 0-00108 
II Full year baro- 0-5807 0-02419 0-01549 0-00870 0-00979 0-00596 0-00383 | 
metric heights 
III | Newborn child 0-6455 0-03029 0-01723 0-01306 0-01327 0-00663 0-00664 
IV, | Contemporaneous 0-7808 0-02416 0-02089 0-00327 0-00954 0-00803 0-00151 
barometric 
heights* 
Vv 0-7811 0-01891 0-02090 0-00199 0-00709 0-00803 0-00094 
* Note that the figures in this row are for the original population IV, (see p. above). 
Table 6. The comparative values of the standard deviation of the dist 
| 
N=21 N=5l 
Popula- 
Population tion value (i) on (i) a 
of p Non- Pn Ratio Non- Newt |  Retio 
normal naieniieh (i)/(ii) normal oaaeall (i)/(ii) 
samples P samples — 
I Trumps — 0-3305 | 0-23101 023517 0-982 0-14501 0-14424 1-005 
II | Full year baro- 0-5807 0-25169 0-23449 1-073 0-15591 0-14408 1-082 
metric heights 
Ii | Newborn child 0-6455 0-25307 0-23425 1-080 0-15744 0-14402 1-093 
IV, | Contemporaneous 0-7808 0-23893 0-23366 1-023 0-14698 0-14388 1-022 
barometric 
heights* 
V Beans 0-7811 0-28090 0-23366 1-202 0- 16542 0-14388 1-150 
* Note that the figures in this row are for the original population IV, (see p. above). 


ee ee 


s of 
tive 
heir 
itios 
ople 
ance 
the 
jude 
7) as 


eA RT RT 


—— 


i 
| 


Difference 


(i) — (ii)| 


0-00108 
0-00383 


0-00664 
0-00151 


0-00094 











A. K. GAYEN 


Actual frequency curves for z in samples of 11 and 21, from the population IV, as con- 
sidered in §4, have been shown in Figs. 2a,b respectively. Corresponding curves of z for the 
normal population have also been drawn in each case for comparison. Comparative values 























243 












































Ratio || 


(i)/(ii) 


1-005 
1-082 


1-093 
1-022 


1-150 











of the mean and of o, for the population IV and the normal population are given in Table 7. 
The values of y,(z) are: 
N=11 N=21 
Population IV 0-6475 0-2693 
Normal population 0-2405 0-1101 
stribution of z, in samples from normal and non-normal populations 
N= 101 N=201 N = 401 
Nor (i) | Difference | Non (ii) | Differex Nor (ii) | Difference 
normal | Normal \(i) ~Gi)| | normat | Normal Nl (i) —Gi)| | normal | Normal | (1G) "Giy] 
samples samples samples 
samples samples samples 
—0-00217 | —0-00163 0-00054 —0-00109 | —0-00082 0-00027 —0-00055 | —0-00041 0-00014 
0-00491 0-00294 0-00197 0-00246 0-00146 0-00100 0-00123 0-00073 0-00050 
0-00683 0-00327 0-00356 0-00346 0-00163 0-00183 0-00174 0-00081 0-00093 
0-00475 0-00396 0-00079 0-00237 0-00197 0-00040 0-00118 0-00098 0-00020 
0:00347 0-00396 0-00049 0-00171 0:00197 0-00026 0-00085 0-00098 0-00013 
tion of the distribution of z, in samples from normal and non-normal populations 
J 
N=101 N = 201 N= 401 
(i) a (i) m (i) i 
Non- (ii) Ratio Non- (ii) Ratio Non- (ii) Ratio 
normal | Normal | (iy/(i) normal | Normal | (iyi) normal | Normal | (iyi) 
samples “= samples —* samples P 
0-10236 0-10099 1-014 0-07233 0-07106 1-018 0-05113 0-05012 1-020 
0-10954 0-10093 1-085 0-07722 0-07104 1-087 0-05452 0-05011 1-088 
0-11083 0-10091 1-098 0-07828 0-07103 1-102 0-05525 0-05011 1-103 
0-10307 0-10086 1-022 0-07260 0-97101 1-023 0-05125 0-05011 1-023 
0-11404 0-10086 1-131 0-07961 0-07101 1-121 0-05592 0-0501i 1-116 






































16-2 

















244 Frequency distribution of the product-moment correlation coefficient 


8. THE NORMAL APPROXIMATION TO THE DISTRIBUTION OF z IN SAMPLES FROM 
A NON-NORMAL POPULATION 


The results of the previous section show that the distribution of z, in samples from 
population (1), is not far from normality with mean as in (73) and variance as in (74) 
provided the samples are of reasonable size. Accordingly, the problem of computation 
of the probability integral of r, for the frequency function (32), may be put, without serious 
loss of accuracy, in terms of those of the normal distribution. 

In the normal-theory case, the approximation 





are” ee 
antes «St Ze, (81) 
a 4—p?* = S2 
varz = 7 — i* 30-1) = Di(z), (82) 


considers terms in mean z, (57), up to order (N —1)-1, and those in var z, (61), up to order 
(N —1)-*. In practical application the above approximation, equations (81)—(82), is made 
still rougher by using mean z = 2p, (81’) 


1 , 
varz = 73) (82°) 
for while shortening the calculation needed, it leads in general to a sufficiently close estimate 
of the actual probability integral in samples of moderate size. 

In the present case we may take a similar approximation, as in (81)-(82), considering 


terms up to order (N —1)-', in (73) and those up to order (N — 1)-?, in (74). Thus we have, 


P(3 — p?) (Ago + Agg) — 4(1 +0?) (Agy + Aj3) + 20(5 + Pp?) Age 


mean z = 29+ 8(N —1)(1—p*)? 








= O(z), say (83) 
1 2+p* | (p?(Ago + Ags) — 40(Agi + Aig) + 2(2 +2) A 
varz = D3(z)+ Won-We a 40 + Ags) Tat (2+ p") Ags 


+ ypapp tal — 0°30) (Abo + Abs) + 4000 dng) — (4-+ 1398+ pt) (AB + AB) 
+ 12p(2 + p*) (Ag, Aye) — 4(1 + 2p) (Ago Arg + Ags Az1) 
+ 2p(5 + p®) (AgoAgy + Ags Arz)]/(1 — p?)>} 
= >(z), say. (84 

It may be of some interest to compare here the customary normal-theory approximations 
involved in the use of z with those reached by the suggested corresponding shorter expressions 
(83)-(84), for the mean and the variance of z, in the case of the non-normal population IV 
based on the frequency of contemporaneous barometric heights. Table 7 below shows these 
results for samples of size N = 11 and N = 21. 

The approximations (83)-(84) for the non-normal case are seen to be as good as (81)—(82) 
in the normal case. What is most noticeable, however, is the very considerable reduction in 
a, for population IV, compared with the ‘normal theory’ figures. 

Results reached for different approximations in the normal-theory case are given on the 
left-hand side of Table 8. It will be seen that in a case like N = 11, for the normal population 
(p = 0-8) both the approximations (81)—(82) and (81’)-(82’) are sufficiently close to the actual 


—E 


A. K. GAYEN 245 


f distribution of z, though the former appears to be more satisfactory than the latter. In 
general, the use of the actual value of p in the expression for the variance (i.e. the form of 
approximation (81)—(82)) leads, particularly in the case of small samples, to closer estimates 

































































from of the probability of z. 
(74) 
ation | Table 7. Comparing approximations reached by the suggested shorter expressions 
rious : (formulae (83)-(84)) for the mean and the variance of z 
i 
4 
' Population IV based on the frequency ; 
‘ 
of contemporaneous barometric heights Normal population 
(81) | Size | 
of Mean z from o, from Mean z from o, from 
sample formula formula formula formula 
(82) | 
order (73) (83) (74) (84) (57) (81) vt) (82) 
made 
81’ t N=11 1-1398 1-1356 0-3139 0-3099 1-1443 1-1386 0-3459 0-3418 
(81’) N=21 | 1-1182 | 1-1171 | 02070 | 0-2062 | 1-1200 | 1-1186 | 0-2336 | 0-2398 
(oar) | 
Table 8. Comparison of the probability integrais of z under various assumptions for 
mate | the case N = 11,p = 0-8 
erin ity j 
‘ 8 F Probability integral, sampling from . re kone al, 
ave, | a normal population population FV 
rT z 
(83 (i) Normal (ii) Normal Actual, Normal Actual, 
) approx. approx. calculated approx. calculated 
Zo, 1//(N—3))| (2, Xo(z)) from (28) (A(z), X(z)) from (32) 
0-00 0-0000 0-00064 0:00043 0-00089 0-00013 0-00019 
+0-10 0-1003 0-00166 0-00119 0-00194 0-00042 0-00056 
+0-20 0-2027 0-00406 0-00309 0-00419 0-00132 0-00154 
+0-30 0-3095 0-00951 0-00763 0-00900 0-00385 0-00413 
+040 0-4236 0-02158 0-01822 0-01940 0-01080 0-01085 
84 + 0-50 00-5493 0-04778 0-04232 0-04223 0-02925 0-02830 
( ) + 0-60 0-6931 0-10384 0-09621 0-09311 0-07668 0-07364 
tions + 0-650 0-7753 0-15207 0-14388 0-13872 0-12247 0-11807 
sions | + 0-700 0-8673 0-22143 0-21364 0-20658 0-19328 0-18787 
n IV | + 0-750 0-9730 0-31970 0-31394 0-30596 0-29981 0-29451 
h f + 0-800 1-0986 0-45496 0-45341 0-44643 0-45244 0-44876 
these | 
+ 0-825 1-1723 0-53793 0-53923 0-53348 0-54704 0-54432 
+ 0-850 1-2562 0-63023 0-63455 0-63020 0-65129 0-64901 
-(82) + 0-875 1-3540 0-72883 0-73575 0-73243 0-75947 0-75666 
min | + 0-900 1-4722 0-82731 083550 0-83226 0-86126 0-85650 
+ 0-925 1-6226 0-91449 0-92163 0-91764 0-94193 0-93573 
1 the + 0-950 1-8318 0-97504 097873 |" 0-97489 0-98765 0-98254 
ti +0-975 2-1847 0-99846 0-99890 0-99776 0-99964 0-99876 
tion + 1-000 oo 1-00000 1-00000 1-00000 1-00000 1-00000 
tual =| 






































246 Frequency distribution of the product-moment correlation coefficient 


To examine the adequacy of the approximation (83)—(84) in small samples from moderately 
non-normal populations, we shall apply it in the case of the distribution of r in samples of 
11 from the population IV, based on contemporaneous barometric heights, already obtained 
in §4. 

Thus we have for N = 11, 


mean z = O(z) = 1-:135649, o, = }(z) = 0-309923. 


The results given on the right-hand side of Table 8 show a satisfactory agreement with the 
actual distribution, for the non-normal case considered. Indeed, the agreement compares 
well with that of the corresponding normal-theory case. 


SUMMARY 


The mathematical form of the distribution of r in non-normal samples is obtained, the parent 
population being specified by the bivariate Edgeworth surface including terms as far as 
those in A3o, Ago Aq, ---, AggAzg and AzZ,. Being applicable for any p, the derived law holds 
good (i) for any size of sample if the fifth and higher order population 1’s are negligible, and 
(ii) for any population provided the samples are fairly large; it is not unlikely that the 
formula has quite an extended range of applicability for moderate size of samples. 

The tables of the ordinates of the corrective functions due to fourth-order and the square 
and product of third-order semi-invariants (measuring respectively the excess and skewness 
of the population) are obtained for N = 11, p = 0-0 and 0-8 and N = 21, p = 0-8. Frequency 
curves for r and z (Fisher’s logarithmic transformation of the coefficient of correlation) in 
samples of 11 and 21, from a moderately non-normal population (p = 0-8 approximately) are 
drawn for comparison with the corresponding curves of the normal-theory case. 

In cases where p is not zero, the normal-theory law of r is found to be affected by changes 
in the population form however large the sample size is. But when p is zero the effect of 
non-normality on the distribution of r is not very serious and is smaller still if the variables 
are completely independent. The true probabilities of r may be examined from the derived 
distribution for a priori values of p and the A’s of the population. 

The transformed variate z, also, is considerably influenced by the non-normality of the 
population if p is not zero, but the disturbance is largely in the mean and variance, the 
distribution being still nearly normal (although the approach towards normality is not so 
rapid as in the normal-theory case). Practical examples taken indicate that for samples of 
reasonable size the assumption of the normality of z is a remarkably good approximation. 
Accordingly, the problem of computation of the probability integral of r for the derived law 
may be put in terms of those of the normal distribution without serious loss of accuracy. 
Illustration shows that for samples as small as 11 from a moderately non-normal population, 
it is sufficient to include in mean z, (73), terms up to O[(N — 1)-] and in var z, (74), those up 


to O[(N—1)-*] only, thereby shortening the calculation of the probability integral of 
r considerably. 


In conclusion, I should like to acknowledge my indebtedness to Dr H. E. Daniels for his 
kind advice and criticism in the course of my investigations; also to Prof. E. S. Pearson for 
suggesting a number of improvements to the paper. 











tely 
s of 
ned 


the 
ares 


rent 
ras 
olds 
and 
the 


uare 
ness 
oncy 
n) in 
) are 


nges 
ct of 
ubles 
‘ived 


f the 
, the 
ot so 
es of 
tion. 
lL law 
racy. 
tion, 
se up 
al of 


yr his 
n for 








aoe 


A. K. GayvEen 247 


REFERENCES* 


Davin, F. N. (1938). Tables of the Ordinates and Probability Integral of the Distribution of the Correlation 
Coefficient in Small Samples. Cambridge University Press. 

Fisuer, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples 
from an indefinitely large population. Biometrika, 10, 507. 

FisHER, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. 
Metron, 1, no. 4, 1. 

KENDALL, M. G. (1949). Rank and Product-moment correlation. Biometrika, 36, 177. 

Pearson, E. S. (1929). Some notes on sampling tests with two variables. Biometrika, 21, 337. 

Pearson, K. (1925). The fifteen constant bivariate frequency surface. Biometrika, 17, 268. 

PEPPER, J. (1929). Studies in the theory of sampling. Biometrika, 21, 231. 

Pretorius, S. J. (1930). Skew bivariate frequency surfaces examined in the light of numerical 
illustrations. Biometrika, 22, 109. 

QUENSEL, C. E. (1938). The distributions of the second moment and of the correlation coefficient in 
samples from populations of Type A. Lunds. Univ. Arss. N.F. Adv. 2, Bd. 34, Nr. 4; Kungl. Fys. 
Sal. Handl. N.F. Bd. 49, Nr. 4. 

River, P. R. (1932). The distribution of the correlation coefficient in small samples. Biometrika, 24, 382. 

Romanovsky, V. (1925). On the moments of standard deviations and of correlation coefficient in 
samples from normal population. Metron, 5, no. 4, 3. 

Soper, H. E., Youna, A. W., Cave, B. M., Lez, A. & Pearson, K. (1917). A Co-operative Study—On 
the distribution of the correlation coefficient in small samples. Biometrika, 11, 328. 

TscuupProw, A. A. (1925). Grundbegriffe und Grundprobleme der Korrelationstheorie. Leipzig: Teubner 
(English translation as the Mathematical Theory of Correlation, William Hodge and Co. Ltd., 1939). 

Van UveEn, M. J. (1925-9). On treating skew correlation. Proc. K. Akad. Wet. Amst. 28, nos. 8-9, 
797; 28, no. 10, 919; 29, no. 4, 580; 32, no. 4, 408. 

WicksELL, S. D. (1917). The correlation function of Type A and the regression of its characteristics. 
K. Svenska VetenskAkad. Handl. Bd. 58, no. 3, 1-48. 


* A bibliography of the chief papers dealing with artificial sampling experiments from non-normal 
populations is given by G. B. Hey, Biometrika, 30, 79 (1938). 











[ 248 ] 
MISCELLANEA 


Some observations on the practical aspects of weighing designs 
By K. 8. BANERJEE, Pusa, Bihar, India 


1. IntTRODUCTION 


Certain numerical illustrations of the author’s methods were given in a previous paper (Banerjee, 1950), 
and the following notes are intended to assist in understanding the working, and also to discuss some of 
the practical aspects of weighing designs. 


2. SPRING BALANCE DESIGNS IN BALANCED INCOMPLETE BLOCKS 
Let v objects be weighed in combinations of k at a time (k<v), such that each pair of objects occur 
together in the same number (A) of weighings. Let r be the number of times each object is included in 
a weighing (number of replicates) and b be the number of weighings (blocks). Then, as is well known from 
the theory of balanced incomplete blocks, rv = bk and (v—1)A = r(k—1). Suppose that the records of 
the successive weighings are ¥;, Yo,---,Y»- Let w,,W,,...,W, be the estimates (by least squares) of the 


weights W,, W2, ...,@,, of the v objects. ‘Then the normal equations determining the w, are in general 
(1950, pp. 50-1) 


Aw, + 1W,+Aws+... FAW, = 2 


Oe ee eee SECC COCOOSOSOOOSOOSOSS TOS 


rw, +Aw,+Aws+... +Aw, = = 


(1) 


Poe e eee ETO SUOCCOCCOCOCOOOOSOSSOOOCOC IS 


where z,(i = 1, 2,...,v) are the elements of the vector X’Y, Y being the vector {y,, yg, -.., yy} and X’ the 
transpose of the design matrix X. 


Then from the elements of [X’X]~! (1950, equations (3) and (4)), we readily deduce that if T’ be written 
v 
for & (z,), which incidentally is the same as kt, where t = & (y;), we have 
1 1 


w;,= ey z od $= 1,2 
ewe r+A(v—1) ee eee 
For a balanced incomplete blocks (b.i.b.) design this becomes 


a 1 4 AT 
" f-Al- © 


— At 
= Al oy 
which, with the aid of the design matrix X, can be exprossed as @ linear function of the y,(j = 1, 2, ..., b) 
in order to work out the standard error of the w, (see later). 


In the first illustration (1950, p. 56) we have r= k = 4, A = 2, whence w, = $(z,—47), which was 
the calculation actually used. Alternatively 
w, = H(z, — ht) 
so that it is one-quarter of the sum of four selected values of y less the sum of the remaining three. Its 
estimated variance will therefore be that of any y multiplied by 7%. 
Now in general let the sum of tho estimated weights (& in number) of the objects weighed on the 


b 
jth occasion (7 = 1, 2,...,6) be Y;. Thon, for a b.i.b. design X& (y;— Y;) = 0, for, adding up the columns of 
(1) we have 1 


v v 
{r+A(v—1)} lw, = x2 
1 1 
: v b 
i.e. rk Dw, =k Dy;. 
1 1 
b 
Dividing both sides by k we have = Ly, 
1 


v 
riw, 

1 

v 
ZY, 

1 


» Me 


Y50 








| A LT 





950), 
ne of 


ecur 
od in 
from 
ds of 
r the 
neral 


(1) 


’ the 


itten 


1 the 


ns of 








EE 


Miscellanea 
b b 
Also 2 (ys— Y,?= Zyslys— Y;) 
b b 
= Lyj— Ly ¥; 
1 1 
b v 
= } Ly} > »» Wye 
1 1 
from standard least square theory. This is a convenient way of calculating the sum of the squared residuals. 


Another is to calculate b 1 2 
X(ys-9)?- X (z,—2)?,* 
1 r—-Ay 





where ¥ and Z denote the means of y,; and z; rc -ectively. 
Then the estimated variance of y will be b 
2ys - Y;) 


a= 


b-—v 
from which the estimated variance of w, can be obtained. In the above illustration, for example, 
83, = 15% 


being the same for all w,. It should be noted, however, that no estimate is possible with this illustration, 
nor, indeed, with any other b.i.b. design in which b = v. For in such cases we have Y, = y; for allj and thus 


b 
2 (ys— Y;)? = 0. 


In the general case s%,, will be sj multiplied by 
1, AXb=r) 
r r(r—A)? 





for b.i.b. designs. 


3. THE ELIMINATION OF A BIAS SUSPECTED AFTER THE WEIGHING OPERATIONS HAVE BEEN 
COMPLETED IN A SPRING BALANCE 


It has been shown (Hotelling, 1944; Mood, 1946) how it is possible to adjust the design matrix to suit the 
estimation of the weights of the objects in a biased chemical balance. The device is to reduce the elements 
of the first column of the design matrix to + 1 (1944, 1946) without affecting the nature and efficiency of 
the design. But in the case of a spring balance design, where the elements of the matrix are restricted to be 
+1 or 0, it is not possible to take account of the bias by reducing the elements of the first column of the 
design matrix to + 1 without affecting the design. If, instead, an additional column of + 1’s is added to 
the design, corresponding to a bias w», which can be looked on as the weight of an additional object, and 
the matrix is used in the weighing operations with the biased balance; then the normal equations will be 
(v+ 1) in number, as follows: 


bw tru, t+rw, = ¢’ 

TWo+ = 2% 

TW 5+ (The left-hand side =z i (2) 
: of equations (1)) 

TW) + = 2% 


where t’, 21, 2g, ---» 2, have the same significance as before in relation to the new records of weighings with 
the biased balance. Adding up both sides of (2), we have 


(b+ rv) wy +{2r +A(v—1)} w, + {27 +A(v— 1)} wet... + {2r+A(v—l)} w, =U +k’, 
i.e. (k+1) (bwot+rw,+rwet+...+7w,) = (kK+1)0. 


Cancelling (k +1) from both sides, we get the first equation of (2). The equations (2) are not therefore 
independent, and as such, the estimate of w, cannot be obtained. If, on the other hand, a column of ones 
and a row of zeros in that order are added to the design matrix (1950, p. 52), it will be possible to estimate 
the bias. The normal equations in such a situation will take the form of (2) except for the first equation, 
which will be 


(+1) wo trw, t+ rwgt...+7W, = Yotl’, 
* This result was communicated to me by Dr John Wishart. 








250 Miscellanea 


where ¥, is the record of weighing with empty pans, i.e. the record of weighing corresponding to the bias. 
Adding up the equations and subtracting the total from the first, we get wy = yo. Subtracting ry, from 
all the equations beginning from the second, we get a new set of equations similar in form to (1), which 
can be easily solved as before. 

It now remains to see if it is possible to eliminate a suspected bias from the weighing operations, the 
bias being suspected after the weighing operations have already been completed. In such a situation, 
the value of y, will be unknown. 

Let us refer to the design matrix, where an additional column of ones and a row of zeros are added 
to a b.i.b. design to suit a biased balance, and let us suppose that the weighing operations have been made 
under the assumption that the balance was free of bias. Denoting by w, the estimate of the ith object 
when no bias is suspected and by w¥ the estimate when a bias is suspected, we get, by solving the normal 
equations, 


a A 
we = aw |e) a (F—Pewad} 


Fie _A (kr—Av) 
= 7A) (= 7 (r—A)k 


rf 1 

= * k Yo: : 
If now an attempt is made to estimate y, by minimizing the error variance, as is commonly done under 
the missing plot technique in agrobiological experiments (Yates, 1933), it is noticed that y, does not 
occur in the expression for the error variance on account of the fact that in each weighing operation, the 
objects are weighed k at a time. While stating that the estimates cannot be determined free of the bias, 
it is noticed that the sum of squares of the residuals does not undergo any change. 

The elimination of the bias may however be possible, if the design matrix is of the sub-matrix type 
or design incomplete in number of coluinns. An illustration (Mood, 1946) of seven columns of 
L,,(r = 6,A = 3), with the records of weighings, is furnished below: 

We Ws Wy Wy We Wy g. 
9-11926 
11-33376 
6-95805 
10-63964 
14-45202 
13-86458 
9-78492 
6-85182 
4-63865 
10-02561 
6-05214 


Each object is repeated an equal number of times (r), as also each pair of objects (A), but the number of 
objects in the different weighing operations is not constant. Solving the normal equations, which are of 
the same nature as in (1), the estimates are obtained as 5-85823, 3-52843, 1-78603, 1-94687, 1-64352, 
1-04845 and 1-47520. It may be noted in particular that in such designs 


—— OM Ooo 


Ke ocmraoanrh wd = 
Com ococewrwoee & 
=mooocorrrK Orr oO 
coocew ww Oe Ke or 
COM Re Cm eRe OKO 


pnnuinvrnon ud 


—_ bm 
oOomrwrw rw c 
_ 


b 
Xu, — Y,)+0. 


Let us suppose that the above design has been used in the weighing operations with a biased balance 
and that the value y, is not known. Solving the normal equations, the estimate of the bias is obtained as 


Wo = ¥Yo+4(0-00017). Here wf = w;— }wy. 


Solving the equation obtained by minimizing the error variance with respect to y, (1933), the value of 
Yo, a8 also that of wo, is obtained as 0-00017. The amount of the necessary adjustment is therefore given by 
jw, = 0-00004. 


4. STANDARDIZATION OF WEIGHT-UNITS 


As standardization also means weighing operations, repeated weighings are needed to get steady 
estimates. An illustration is furnished here to show that the standardized weight-units may perhaps be 
obtained without making any separate weighings. The procedure may be adopted in situations where 


~ 

























Miscellanea 251 





e bias, the weight-units to be standardized are not too numerous to render the solution of the normal equations 
ofrom | lengthy. 
which Let us refer to the b.i.b. design given by r = 5, b = 10, v = 6, k = 3, A = 2, and let us suppose that it 
was necessary to standardize the largest weight-units, namely the weight-units of 5 and 10g. only. Two 
ns, the columns may be added to the design matrix to represent these two adjustments supposed as two additional 
lation, objects (Wo) and wW»,) to be weighed, the two columns being filled in with 1 or 0 as the case may be. For 
example, if, of the two weight-units which require standardization, the weight-unit of 5g. only is needed, 
added [| among other smaller weight-units, in a weighing operation, the weighing should be completed as usual, 
imade = | inserting a plus one in the column for Wo. and a zero in the column for wo, in the relevant row. If neither 
object | of the two weight-units is used in any weighing operation, there will be zeros under both the columns in 
1ormal 5 that particular row. The record of weighings with the modified design matrix is shown below: 
Wop Wo1 Oy Wg Wg We We We g- 
4 021111 0 0 0 = = 11-17575 
i 1001100 1 = _ 636704 
0141310210 0 = = 11-33746 
Pe Oo BVO Pd -@ ce 7-12503 
160 1°0 £0 1 O = 9-3008) 
PO CO 4--Orge a) PS 6-22389 
d f oe oo 29 ££ @. Fs 8-86387 
a 2 10001110 = + &87762 
ay net 10100011 = 8656015 
— 00001101 = 479442 
e bias, 
The normal equations are given by: 
ga Woy Wor Wi We Wz Wy Ws We g- 
7 0 3 3 3 3 5 4 = 5181701 (1) 
0 2 2 2 141 1 0 0 = 2232-51321 -(2) 
3 2 52 2 2 2 2 = 47-23664 (3) 
3 2 26232 232 2 = #43017 @® { 
3 1 22 65 22 2 = $37-01424 (5) 
3 1 222 56 2 2 = 3937-49840 (6) 
5 0 222 23 & 2 = 36666 (7) 
4 0 22 2 2 2 5 = 3480937 (8) 
; The solution of the nor..:al equations is not difficult. If 3 of the sum of equations (3) to (8) (i.e. the 
| equations corresponding to w,, We, ..., Ws) be subtracted from each, the left-hand sides will contain only 
the variables wo, and wp», and an additional variable w;. These equations may be utilized to eliminate the 
w;,’s from the first two equations. In the present instance, the equations in Wo. and wp», are obtained as 
14wW 9 — 6wo, = — 0-026610, 
ber of — bw, + 4w9, = — 0-027480. 
areof 


54352 : When the values of wo, and wy, are known, appropriate multiples of these values may be subtracted from 
: each of the equations (3) to (8) to get a set of equations which admit of easy solution. The estimates of the 
adjustments, Wg. and wp», and of the objects, are obtained as — 0-01357, — 002722; and 5-87405, 3-53823, 

1-79085, 195223, 1-64811 and 1-05134 respectively. 


My grateful thanks are due to Prof. R. C. Bose and to Dr John Wishart for helpful suggestions in 


lance connexion with this work. 
1ed as 
REFERENCES 
ei oe BANERJEE, K. S. (1950). How balanced incomplete block designs may be made to furnish orthogonal 


estimates in weighing designs. Biometrika, 37, 50. 

ren by Horetiine, H. (1944). Some improvements in weighing and other experimental techniques. Ann. 
Math. Statist. 15, 297. 

Moon, A. M. (1946). On Hotelling’s weighing problem. Ann. Math. Statist. 17, 432. 

Yates, F. (1933). The analysis of replicated experiments when the field results are incomplete. Emp. 


teady J. Exp. Agric. 1, 129. 


ips be 
where 








252 Miscellanea 


Test for the significance of the difference between means in two normal populations 
having unequal variances 


By D. G. C. GRONOW, University College, London 


It is assumed that samples of size n, and n, are drawn from normal populations IT, and IT, respectively. 
The sample units are z,; (¢ = 1,2; i = 1, 2,...,,) and the question is posed: ‘Are the data consistent with 
the hypothesis that the two population means £, and &, are equal?’ In the case where the population 
variances cannot be assumed equal, several criteria have been put forward to test the hypothesis £, = £,, 
and the problem has been discussed by numerous writers. In the present paper we shall consider only 
two test statistics which Welch (1937) denoted by u and v: 


%,—%, _ _%1—%, 
((C b) (Gaze aay <€ ay" 
Ny Ng Ny +Ng—2 NM, Ns 


where = 2 (245 — %)?/(m%— 1). 
=1 


“u= 








It seemed of interest to investigate the power of u and v to establish significance when there is a real 
difference in the population means and the variances may be unequal. This problem has already been 
solved in principle by Hsu (1938), but his solution is not in a form which readily allows of numerical 
calculation. 

The method of procedure followed has been to obtain approximations to the moments of the distribu- 
tions of u and v by the use of Fisher’s k-statistics. Thus in the general approach to the problem, of which 
the work here is a simplified case, nothing is assumed about the two populations except that the cumulants 
of each exist, but are different one from the other. In the simplified case considered in this note, we suppose 
that the two populations are normal but may have different means and variances. Giving dashes to the 
k-statistics of the second population sampled, it is clear that both u and v may be written in the form 


b( key — ky) (ky +aks)-4, 
where the constants a and 6 are different for u and for v, except where n, = ng, when u=v. A single expan- 
sion will thus serve for both cases. 
Let z= (k, — kj) (kg + ak)~*. 


Expanding by Taylor’s theorem about the point («,, Kj, Kg, 3), taking expectations and collecting terms, 
we have, to order (n—1)~4, 


8 1 32 1 380%? 
=-|1 —— 
é) | t-1 4 *n,-1 4 


.- oe 21k; 1 105a*x3 xj? L wae 2laxk; 
* n= Pare \—* ere] *( ne 























167?) “(nm,—1)(my—1) 1678 (ng—1)* 278 167? 
. 1 105x$ ( _ re) ‘ 1 315a°x3 K? ss 33x, 
(n,—1)® 878 7? * 16r4}) © (m,—1)"(my—1) 87” 167? 
m 1 315a*x3x,3 z 33ax;\ eB 1 = 105a*xj# 1 34K 330%? 
(n,—-1)(ng—1)* = 8 167? } “ (mg—1)* 878 7? 16r4 


re l ae ( 715K, 21453 ae 


(n,—1)* 270 l6r?  32r* * logar® 








ra 1 3465a%x$ «;? 13K, 195K} 
(ny—1)8(mg—1) 3272 7?" 16r4 

‘ 1 1155a%x3 «33 : 39(Kg+ak3) 1755aK,K; 
(n,—1)*(m,—1)? 478 167? 25674 





1 3465a4x2 x}! 13ax, 195a*x?? 
+ Menge ats Wests ons 
(ny—1)(mg—1)® 32712 7 16r* 


er nee 


op wee 


ee 


iS 


ively. 
, with 
ation 

= £2, 


only 


erical 


ribu- 
vhich 
ants 


ppose 
0 the 
rm 


<pan- 


erms, 





' 
; 


Miscellanea 
715ax, 2145a%’2 zoe 8 


253 





1 = 21a®x«5° 
167? 32r4 1024r® 


+ in,—1)* 2780 
where 6=k,—-K,=§,—-& and 7 = (k,+ak;). 

By taking second, third: and fourth powers of z, expanding in a way similar to the above and taking 
expectations, the second, third and fourth moments of z were obtained. These moments are, however, 
of great length, and there appears to be little point in reproducing them here. Numerical values were 
substituted in these expressions, and the appropriate Pearson or Gram-Charlier Type A curve (with the 
same mean, standard deviation, £, and £,) was assumed to be an approximation to the unknown true 
distribution. 

Following the procedure briefly outlined, the distributions of u and v under hypothesis H, (i.e. 6 = 0) 
were first investigated. As Welch (1937) has already pointed out, if the variances are not the same in the 
two populations sampled, the assumption that u is distributed as ‘Student’s’ ¢ with n,+n,—2 degrees 
of freedom will lead to bias in tests of significance. His results are confirmed by the present investigation. 
Taking the case of n, = n, = 10 (u = v), a Pearson Type VII with mean at zero and second and fourth 
moments obtained by the k-statistic method was used as an approximation to the true distribution. By 
this means, the probability that u (= v) falls beyond the 5 and 1 % significance levels of the ¢-distribution 
with 18 degrees of freedom was calculated. These approximate probabilities are shown in Table La, the 
double-tailed test having been considered. 


Table la. Probability, as a percentage, that u (= v) falls beyond the 5 and 1 % points of the t-distribution 
with 18 degrees of freedom, calculated by the approximate method 


Case: n, = n, = 10, d= 0 








Ratio of > 1 1-5 2 2°5 3 
Ky/Ke 
5% 5-00 5-05 5-13 5:22 5-30 
1% 1-00 1-01 1-04 1-09 1-13 


























The figures in the 5% row correspond to those which can be read from the curve (a) given in Fig. 1 of 
Welch’s (1937) paper, where he used a different method of approximation. The bias in this case is nowhere 
very serious. 

Similar calculations for the case n, = 15, n, = 5, 6 = 0 were carried out. These are given in Table 1), 
the results in this case being different for u and for v. 


Table 1b. Probability, as a percentage, that u and v fall beyond the 5 and 1% 
points of the t-distribution with 18 degrees of freedom 


Case: n, = 15, n, = 5,3 =0 






































Criterion | Ratio of > t 4 1 2 3 
K/Ks 
5% 1-40 2-25 5-00 9-77 13-20 
u 
1% 0-15 0-31 1-00 2-81 4-55 
5% 5-47 5°85 6-83 7-99 8-67 
v 
1% 1-15 1-33 1-81 2-54 2-94 

















254 Miscellanea 


The values for u correspond approximately to those which can be read from Welch’s curve (b) and those 
for v from his curve (c). The bias, as might be expected for the case of unequal sample sizes, is much worse 
that in the case previously considered. As Welch pointed out, the bias is rather less using v than using wu. 

The approximations to the true moments of u and v can also be used to answer the following questions. 
How will inequality in the population variances affect the chance of establishing the significance of a 
difference, 5, in means if (i) u and (ii) v are referred to the ¢-distribution with n, +, — 2 degrees of freedom? 
If inequality in variances is suspected, which test would it be preferable to use? As the figures in Tables 
1a and 16 show, as soon as the variances become unequal, the percentage levels for ¢ are no longer correct 
for u, and even when Kk, = kj, they are not correct for v unless n, = n,. However, without the knowledge 
of ki/x3, it is not possible to correct the levels. Let us consider, therefore, what happens if we do use the 


t levels, lacking any means of correction. The results of following this procedure are shown in Tables 2 
and 3. 


Table 2. Power of the u and v tests to detect a difference 3 between population means 
(referred to 5 %, ty, significance point) 


(a) Case: n, = n, = 10, a = 0-05 (error of first kind, i.e. when 6 = 0, Ky = kK). 
Power of u (= v, in this case) 











0 0:5 1 1-5 2 2-5 3 
1 0-050 0-182 0-562 0-892 0-987 0-998 1-000 
15 0-050 0-156 0-471 0-816 0-966 0-994 0-999 
2 0-051 0-139 0-406 0-743 0-934 0-989 0-998 
2°5 0-052 0-127 0-357 0-675 0-896 0-977 0-996 
3 0-053 0-120 0-320 0-616 0-854 0-961 0-992 
































(6) Case: n, = 15, n, = 5, a = 0-05. Power of u test 











0 0-5 1 1-5 2 2-5 3 
4 0-014 0-111 0-518 0-912 0-988 0-998 1-000 
4 0-023 0-124 0-492 0-876 0-970 0-996 0-999 
1 0-050 0-148 0-446 0-792 0-958 0-994 0-998 
2 0-098 0-179 0-396 0-676 0-878 0-968 0-994 
3 0-132 0-194 0-367 0-603 0-805 0-926 0-977 
































(c) Case: ny = 15, ng = 5, a = 0-05. Power of v test 








at 8/K, 
0 0-5 | 1-5 2 2°5 3 
Ka/Ky 
ry 0-055 0-259 0-744 0-968 0-994 0-999 1-000 
4 0-059 0-220 0-653 0-935 0-987 0-997 1-000 
1 0-068 0-171 0-471 0-797 0-947 0-991 0-998 
2 0-080 0-132 0-299 0-594 0-829 0-940 0-978 
3 0-087 0-118 0-217 0-438 0-679 0-843 0-930 













































those 
vorse 
ng u. 
ions. 
of a 
lom? 
ables 
rrect 
ledge 
e the 
les 2 


NAOT O 























Miscellanea 
Table 3. Power of the u and v tests to detect a difference 5 between population means 
(referred to 1 %, tye significance point) 


(a) Case: n, = n, = 10, a = 0-01 (error of first kind, i.e. when 6 = 0, K, = Kq)- 
Power of u (= v, in this case) 











0 0-5 1 1-5 2 2:5 3 
1 0-010 0-061 0-276 0-687 0-933 0-992 0-999 
1-5 0-010 0-051 0-219 0-561 0-860 0-973 0-996 
2 0-010 0-044 0-177 0-464 0-778 0-942 0-989 
2-5 0-011 0-040 0-150 0-391 0-697 0-898 0-975 
3 0-011 0-038 0-131 0-335 0-623 0-849 0-955 






































8/K, 
0 0-5 1 15 2 2-5 3 
Ka/K, 
4 0-001 0-028 0-192 0-657 0-941 0-995 1-000 
4 0-003 0-033 0-200 0-609 0-912 0-988 0-998 
1 0-010 0-047 0-201 0-525 0-836 © 0-965 0-994 
2 0-028 0-068 0-19". 0-433 0-703 0-888 0-969 
3 0-045 0-082 0-194 0-382 0-613 0-808 0-923 
































(c) Case: n, = 15, n, = 5, a = 0-01. Power of v test 











0 0-5 1 15 2 2-5 3 
4 0-011 0-101 0-473 0-878 0-983 0-994 0-999 
4 0-013 0-083 0-372 0-787 0-961 0-989 0-995 
1 0-018 0-071 0-232 0-557 0-827 0-947 0-988 
2 0-025 0-057 0-127 0-313 0-597 0-813 0-922 
3 0-029 0-044 0-098 0-195 0-402 0-627 0-789 
































When n, = n, = 10, the chance of rejecting the null hypothesis when it is true, i.e. when é = 0, never 
differs much from the nominal 0-05 and 0-01. Inequality in variances does, however, somewhat reduce 
the power of the test. For example, the chance of establishing significance at the 5 % level when é = 1-5x, 
is 0-89 if kj = Ky, but has fallen to 0-62 if x, = 3ks. 

When the sample sizes are unequal, the position is rather different. We have to take note of which 
sample comes from the population with the larger variance; the tables, therefore, show results for Kj/K, 
greater and less than unity. Comparing Tables 26 and c, 3b and c, the following points may be noted: 

(i) The significance levels of ‘Student’s’ ¢ may be very inaccurate; if the variances are nearly equal, 
the bias is less for u, but when they differ widely v is affected less than wu. 

(ii) Whether u or v is employed, for a given difference in means of detectable size, the chance of 
establishing significance is considerably larger when «;/k,<1 than when it is >1. This means, as one 
would expect, that there is a considerable gain if the larger sample is taken from the population with the 
larger variance. 





256 


Miscellanea 





(iii) Owing to the varying bias in the significance levels, it is not possible to make precise comparisons, 
but the following figures taken from Table 2 are illustrative of what occurs. While figures of this kind 
indicate the performance characteristic of the two procedures, it is difficult to suggest any simple basis 
on which to decide which it is preferable to adopt. 


Table 4. Examples of the consequences of rejecting the null hypothesis whenever u (or v) 


exceeds the 5 % level of t with 18 degrees of freedom 









































Ky/Ke Test Sample sizes > nm, = 15 n,=10 nm, =5 
statistic Ng =5 n,= 10 N, = 15 
Chance of: 
u Rejecting H, when 6 = 0 0-050 0-050 0-050 
Establishing significance if d = 1-5«, 0-792 0-892 0-792 
1 
Chance of: 
v Rejecting H, when 6 = 0 0-068 0-050 0-068 
Establishing significance if 6 = 1-5x, 0-797 0-892 0-797 
Chance of: 
u Rejecting H, when 6 = 0 0-023 0-051 0-098 
Establishing significance if é = 1-5, 0-876 0-743 0-676 
4 
Chance of: 
v Rejecting H, when 6 = 0 0-059 0-051 0-080 
Establishing significance if 6 = 1-5x, 0-935 0-743 0-594 
Chance of: 
u Rejecting H, when 6 = 0 0-014 0-053 0-132 
Establishing significance if d = 1-5, 0-912 0-616 0-603 
$ 
Chance of: 
v Rejecting H, when 6 = 0 0-055 0-053 0-087 
Establishing significance if d = 1-5x, 0-968 0-616 0-438 





Finally, it should be noted that it is to overcome these difficulties that quite different alternative test 
procedures have been suggested, leading to Sukhatme’s (1938) tables based on the Fisher-Behrens test 


and Aspin’s (1949) tables based on Welch’s (1947) test. 


REFERENCES 


Asptn, AticE A. (1949). Biometrika, 36, 290. 


Hsv, P. L. (1938). Statist. Res. Mem. 2, 1. 
SuxuatmeE, P. V. (1938). Sankhyd, 4, 39. 
WE Lox, B. L. (1937). Biometrika, 29, 350. 
WEL oH, B. L. (1947). Biometrika, 34, 28. 











\risons, 
is kind 
le basis 
































TE STE 


[ 257 ] 


REVIEWS 


Contributions to Mathematical Statistics. By R. A. Fishzr. New York: John Wiley 
and Sons Inc.; London: Chapman and Hall. 1950. Price £3. 


This latest of Wiley’s series of publications in statistics contains forty-two of R. A. Fisher’s statistical 
papers covering the years 1920-43, reproduced to scale from the originals by photo-offset process. 
Although even with this method of reproduction, the price of the volume is heavy, and may be beyond 
the reach of the average student, it will now be possible for all these papers to be available in Libraries, 
a welcome result. The book has a preface by W. A. Shewhart, the general editor of the series, and this is 
followed by P. C. Mahalanobis’s biographical note written on the occasion of Fisher’s visit to India in 
1938. Each individual paper is preceded by an expository note written by the author. 

In turning over these pages, statisticians of the reviewer’s generation must be conscious of a wealth of 
crowding impressions. The history of the modern theory and practice of mathematical statistics may be 
said to have opened sixty years ago. The place that Karl Pearson’s papers played during the period 1890— 
1915 was played by R. A. Fisher’s papers appearing in this volume during the years 1920-40. Both series 
have had a profound influence in shaping the outlook of their statistical contemporaries, through what is 
written for all to read and through those suggestive flashes of illumination which lead the recipient 
forward either in agreement or disagreement. 

Fisher’s work started from the basis of statistical theory and practice as he found it, the theory of errors, 
Pearson’s frequency curves, the methods of correlation and regression measurement, the application of 
x? in many directions, the attack on the intriguing problems of heredity and evolution. The advances 
which he made have been very great, but the continuity in development is clear and many of the earlier 
papers in these Contributions to Mathematical Statistics are further steps along paths mapped out by his 
predecessor in the Royal Society Series, Mathematical Contributions to the Theory of Evolution. 

Several of Fisher’s early papers are not included in this volume; thus two papers on correlation (1921, 
1924) and a third on ‘Student’s’ ¢ and allied distributions (1925) from Metron are missing, as well as the 
fundamental paper from Biometrika (1915) on the distribution of r in samples from a normal population.* 
The decision to omit the 1915 paper will be generally regretted, for we are deprived of the first introduction 
to the geometrical representation of %, s, t and r, that ‘exceedingly beautiful interpretation in generalised 
space’, to use its author’s own words, that must have fascinated so many students of statistics through 
thirty-five years. 

The leading paper (2, 1920) in the series is one rather little known by statisticians because of its publica- 
tion in the Monthly Notices of the Royal Astronomical Society, but it opened the way into a field crowded 
with new ideas which at the time no one but the author himself was ready to follow up. In this paper 
Fisher takes up the old problem of the astronomers, which Gauss had considered and, unknown to sta- 
tisticians, Helmert had solved up to a point, that of the choice between the root-mean-square estimate 
(called o,) and the mean-deviation estimate (¢,) of a population standard deviation (a7). Using the 
geometrical methods which he had put forward in 1915, Fisher derives and compares the standard errors 
of o, and o, as Helmert bad done in 1875. But he goes beyond this by considering the joint distribution 
of o,, 7, and thence shows (a) that for given ,, the distribution of 7, must be independent of o, (6) that 
this result will hold when any other estimator o, is compared with ¢,. Further, (c) he suggests that while 
this property is possessed by o, in sampling from the normal population, for a leptokurtic distribution 
such as the double exponential the roles may be reversed, so that the mean deviation, 7,, will have this 
optimum. property. Thus the fundamental conception of a ‘sufficient statistic’ begins to take shape. 

Three of the early papers (5, 1922; 7, 1923; and 8, 1924) deal with the distribution of x? in tests involving 
the comparison of discrete frequencies with their expectations. Fisher was here fighting a battle to show 
that the introduction of the concept of degrees of freedom would straighten out inconsistencies which it 
was beginning to be seen must follow from Pearson’s formulation of the x? test. He won his point, but was 
handicapped, as all who try to tackle these problems must be, by the impossibility of reaching any final 
results in neat mathematical terms because of the discontinuous character of the multinomial distribution. 
His style of presentation, too, did not perhaps make it easy for his elders to grasp what he was after. The 
reviewer was particularly interested to see paper (7), which he does not think he had ever seen before, 


* [In view of the author’s comment on the reason for the absence of the Metron papers, it should be 


made clear that permission to reproduce this paper from Biometrika would of course have been given had 
it been asked for. Ep.] 


Biometrika 38 17 








258 Reviews 


probably because of its publication in Hconomica. It contains perhaps the clearest exposition of the three, 
and would certainly have helped him in puzzling out the problem at the time, had he read it. One is glad 
to see paper (4, 1922) made available from the Annals of Applied Biology, giving a demonstration, though 
perhaps not what would be regarded as a mathematical proof, that the index of dispersion for samples 
from a Poisson distribution, &(x;—Z)*/%, is distributed as x?. 

Papers (10, 1922) and (11, 1925) contain Fisher’s fundamental work on his theory of estimation; to 
him they formed the basis for the later development of his theory of inductive inference, continued in 
nos. (24, 1934) and (26, 1935) and through the series of papers on fiducial probability (22, 1930), (25, 1935), 
(27, 1936) and (35, 1939). We are carried into a field where the construct of a mathematical theory must 
be linked with the processes of rational thought. Fisher’s work in this category has been fruitful, in part 
directly and in part by the challenge thrown out, which hss forced statisticians to think and where they 
disagree to clarify and formulate their own ideas. 

A point of some interest in connexion with the development of Fisher’s work is that while the first uses 
of the analysis of variance (e.g. in randomized block and Latin square designs) were being illustrated on 
data of field experiments in the early editions of Statistical Methods for Research Workers (1925, 1928), 
there was no parallel presentation of the underlying theoretical model. The paper (12, 1924) read before 
the International Mathematical Congress at Toronto, but not published till 1927, involved little more 
than a restatement of results that had been established for regression problems in (6, 1922), while the much 
later Econometrica paper (13, 1935) is only concerned with the properties of the x’, ¢ and z distributions, 
and not with why the last distribution.could be used in comparing component sums of squares in an 
analysis of variance derived from a particular experimental design. In more than one place in this volume, 
Fisher speaks somewhat scornfully of the mathematical statistician who attempted during the twenties 
and early thirties to teach statistics in the universities. He could not have realized the efforts involved 
in drawing up from the imperfectly presented outline of his ideas, proofs which could be accepted in the 
classroom. We do not know at what stage he himself realized the beautiful simplicity of his ‘normal 
theory’ analysis of variance model : the observations represented in multiple space by a point P belonging 
to a probability density cluster with hyperspherical contours; the problems of the experimenter expressible 
in terms of questions regarding the unknown co-ordinates of the centre O of that cluster; the art of analysis 
consisting in a cunning rotation of axes so that by a comparison of the squares of the projections of OP 
in separate co-ordinate spaces, the significance of ‘effects’ could be judged. Viewed from the geometrical 
angle, one ‘sees’ the result is true; one does not need an algebraic proof. But he did not, perhaps, give us 
very much help to see. A short section (pp. 97-8) in the paper dealing with the applications of ‘Student’s’ 
distribution (Metron, 1925, 5, no. 3), not reissued here, was no doubt written to establish this general idea, 
but its practical interpretation was not driven home convincingly. It was left to others, Irwin, Cochran, 
Kolodzieczyk, to attempt to translate the full consequences of the geometrical concept into more ordinary 
algebraic terms. 

But there was perhaps another reason why no great effort was made to make clear the structure of this 
particular mathematical model. In the handling of agricultural experimental data it was soon realized 
that this construct, with its normal residual variation superimposed on a series of additive terms repre- 
senting ‘treatment’ effects, which alone made the probability levels of the tests ‘exact’, often could not 
represent the true picture of experimental variation. Thus emphasis was placed on the need for ran- 
domization, and the reference set which gave meaning to the tests of significanca became the randomiza- 
tion set and not the set that would be generated by resampling of the normal residuals. This second inter- 
pretation was justified practically, but it meant that the probability integrals of z or F are introduced 
into the tests as approximations only. Generally very good approximations no doubt, but still approxi- 
mations whose accuracy has hardly yet been fully explored. On this extremely important point, the 
present volume has, however, scarcely a word to say from the mathematical angle. It was inevitable that 
many genuinely misunderstood what was in the minds of mathematicians at Rothamsted. 

An interesting paper of a relatively early date is reprinted from the Journal of the Ministry of Agriculture 
(17, 1926); in this Fisher explains in simple terms the objectives that were being aimed at in the planning 
of experimental lay-outs. Printed just before this is the paper on tests of significance in harmonic analysis 
(16, 1929), in which he exercised his skill in hyperspace geometry to derive the distribution of the ratio 
of extreme value to the mean (or rather n times the mean) in samples of n from an exponential distribution. 
This paper has been completed by the addition of a new table giving the 5 and 1 % points for the ratio, 
for the range » = 5 to 50. 

Two outstanding papers from the mathematical viewpoint belong to this middle period (14, 1928) and 
(20, 1929). In the former Fisher derived the sampling distribution, (A), of the multiple correlation coeffi- 
cient R?, in series form, and also two allied distributions, one, (B), which has been termed the distribution 
of a non-central y* and a second (C) which is the distribution of what may be termed a non-central F, 
involving the ratio of a non-central to a central x*. It is likely that all three distributions have a larger 











PEE 


me 


—E~ 





259 


part to perform in statistical theory than they have so far played. In thesecond paper of 1928 he took up 
the problem of the sampling moments of moments and product moments. A new combinatorial technique, 
accompanying his introduction of tne k-functions whose expectations were exactly the cumulants or 
semi-invariants x, broke through what had become to appear an impassable barrier to further advance. 
The paper (21, 1930), giving the derivation of the moments of g, = k,/kt and g, = k,/k, provides an 
excellent example of the application of this technique. 

The year 1930 saw the publication of the first of the four papers included in this series on fiducial pro- 
bability (22, 25, 27 and 35). There have been one or two others, not included. The subject is one of those 
which has been surrounded with controversy and, as in other instances in Fisher’s scientific career, this 
has undoubtedly arisen in part because many of his readers have been unable to grasp his meaning. This 
difficulty has arisen particularly over his use of the theory of probability. Thus in this 1930 paper there 
appears to be a clear statement that he regards as of first importance the relation between probability 
and frequency in repeated trials. Speaking (in 22, p. 534) of the lower fiducial limit for the correlation 
coefficient p when the sample r = 0-99, he writes: ‘The value of p can then only be less than 0-765 in the 
event that r has exceeded its 95 % point, an event which is known to occur just once in 20 trials. In this 
sense p has a probability of just 1 in 20 of being less than 0-765.’ Similar linking of probability and fre- 
quency in repeated trials could be found in many other places; for example, in his simple exposition of 
experimentai design (17, 1926), he illustrates what he means by 5 % significance levels by describing how 
one might draw a line between the 25th and 26th largest deviation found in a series of 500 annual observa- 
tions. Yet as far as can be gathered from the comments in the author’s note accompanying the 
fourth paper of this group, his critics have been seriously at fault for supposing that from 1925 onwards 
he had regarded it as necessary to relate significance levels to frequency of occurrence in repeated sam- 


Reviews 


’ pling! Annoyance at criticism, though regrettable, can be easily understood, but it is surprising that the 


author who makes a point of the importance of aiding the student to avoid common misapprehensions 
should have been content to leave his theory of fiducial probability with this far from clear presentation 
(35, 1939) of the Fisher-Behrens test. 

Although they are relatively accessible it is useful to have together Fisher’s papers from the Annals 
of Eugenics on problems arising in connexion with discriminant analysis .(32, 1936; 33, 1938; 34, 1940; 
and 36, 1939). A paper less easily available to statisticians and all the more welcome in this series is the 
joint paper from the Journal of Animal Ecology (43, 1943) dealing with the relation between the number of 
species and the number of individuals in a random sample of an animal population. 

A publication of this character, with its author’s notes assessing the value of his work as it appears to 
him to-day, is of course bound to be self-revealing. He has thrown himself open to the public view. Many 
of the notes will be helpful to the student reader, as intended, but in one instance at least the comments 
only serve to perpetuate an occasion when reason has been overcome by anger and emotion. Most readers 
will regret the inclusion of the Note on Paper (29, 1937) which, if nothing more, shows in its last sentences 
a profound ignorance of Karl Pearson’s character and, indeed, of his contemporaries. 

The volume is produced in an unusual form. The decision to print the papers in their original size, which 
leaves wide margins for all papers except those from the Philosophical Transactions of the Rovyl Society, 
was a wise one. But it must be a matter of opinion whether it was necessary to reproduce photographically 
manuscript corrections of errors or crooked lines crossing out whole pages of text, presumably in the 
author’s own hand, or to let the veteran of 60 apologize for the young man of 30’s lack of balance in com- 
posing a paper. This is a style of presentation of a living author’s scientific work which may not be to 
everyone’s taste. Be that as it may, the main thing is to have these forty-two papers readily available, 
and for that and the life’s work which lies behind them, we should all like to express our gratitude. 


E. S. PEARSON 


Probability Theory and its Applications. Volume 1. By W. Freiter. New York: 
John Wiley and Sons Inc.; London: Chapman and Hall. 1950. Price 48s. 


This is the first part of a comprehensive treatise on probability theory. It deals with probabilities 
defined on discrete sample spaces only, and the more difficult theory of probability in a continuous 
sample space is left to a second volume. By imposing such a severe restriction on subject-matter in the 
first volume the author has left himself free to write a most thorough and inspiring account of this part 
of the theory. 

The first five chapters deal with the elementary combinatorial rules of discrete probability and their 
application to classical problems. Although most of this is not new, the discussion is elegant, stimulating 
and modern in flavour, for example in giving an interesting account of the difference between Bose- 
Einstein and Fermi-Dirac occupation problems. 


17-2 





260 Reviews 


Two further chapters deal with the Binomial and Poisson distributions and their approximation by 
the normal distribution function, which is introduced here solely as a mathematical function. Three more 
chapters deal with the laws of large numbers and more general types of distribution function. These give 
the best and clearest account of the laws of large numbers for discrete distributions that the reviewer 
has come across. 

After chapter 11, which deals with generating functions, we come to the more original part of the 
book. This begins with two most interesting chapters on the theory of recurrent events, which has been 
largely developed by the author in order to apply it to the remaining part of the book, consisting of four 
chapters on random walk theory, Markov chains, and evolutive stochastic processes with a continuous 
time. Much of the theory of these subjects is here presented in a highly original way. 

This book is a treatise on probability and not on statistics, and as such must be regarded as the most 
important written in modern times. However, no attempt is made to provide anything like a compre- 
hensive bibliography. At the end of each chapter is a series of exercises, many of them very interesting 
in themselves. From these one can apparently deduce that professors at Cornell often park their cars 
where parking is forbidden, play a great deal of poker and bridge, and sometimes have difficulty in 
finding the right key for their front doors on arriving home. The style is clear and amusing and there 
seem to be remarkably few misprints. 

A book of this order of importance deserves the closest scrutiny and the following criticisms on 
matters of detail are put forward for that reason. On p. 24 the word ‘sample’ is restricted to samples 
drawn in a specified order and this may confuse some readers. On p. 38 the author says that a certain 
hypothesis about a particular set of data is to be rejected on the ground that the sample would have had 
an extremely small ‘probability if this hypothesis were true. This is certainly not a valid method of 
reasoning in the case here presented, in which it is easy to construct examples in which the probability 
of any event is extremely small. For example we cannot reject the hypothesis that all permutations of 
a pack of 52 cards are equally likely because the probability of a particular observed order is about 10-*. 
Similarly the statement on p. 56 that with a 1 % level of significance we shall find assignable causes 99 
out of 100 times is also clearly untrue. On p. 57 the statement that only the one-dimensional theory of 
runs has been considered ignores work done by Krishna Iyer and others. In the footnote on p. 120 it is 
not clear why only one degree of freedom is used in the x? considered, and on p. 210 there is a misprint 


in example 7. P. A. P. MORAN 


Experimental Designs. By Witiiam G. Cocnran and GERTRUDE M. Cox. ix + 454 pp. 
New York: John Wiley and Sons Inc.; London: Chapman and Hall. 1950. Price 46s. 


Analysis and Design of Experiments. By H. B. Mann. x+198 pp. New York: Dover 
Publications. 1950. Price $2.95. 


These two books have similar titles and they both deal with the theory and practice of the wide range of 
statistical methods included under the general heading of ‘analysis of variance’. Apart from this they 
have little in common. The difference is only mildly described by saying that the first book is more 
concerned with practice and the second more concerned with theory. The title of the book by Cochran 
and Cox is a fair indication of the contents, but it is to be wished that Prof. Mann had given a more 
accurate indication of the contents of his book in the title he assigned to it. An essential phrase in this 
title should be ‘mathematical theory of’, or words to the same effect. 

Experimental Designs should prove especially valuable to statisticians requiring a compendium of all 
the more useful designs accompanied by an outline of the relevant theory, and also to experimenters 
with sufficient time, inclination and ability to apply modern statistical methods in a serious manner to 
the design, analysis and interpretation of their work. The first three chapters contain a brief introduction 
to tests of significance, estimation, and related concepts, followed by an account of the type of procedure 
employed in the analysis of variance. A warning must be given that the reader cannot expect to 
appreciate this part of the book without previous study of general statistical theory cnd some experience 
in the application of at any rate the simpler forms of analysis of variance. The third chapter, in 
particular, while an excellent summary of the principles underlying the analysis of variance, will not be 
fully understood at a first reading, without a background built from experience of the problems arising 
in @ variety of experimental situations. 

The next twelve chapters deal with different forms of experimental design. Each chapter contains 
a description of the conditions under which the particular design is likely to be useful, and the appro- 
priate analysis, with a simple account of its theoretical basis. In the more complicated cases a selection 















































































261 


ion by of suitable designs (‘Plans’) is given at the end of the chapter. This feature greatly increases the value 
hegrrend of the work as a reference book. 
Begive | The sixteenth chapter (‘Random permutations of 9 and 16 Numbers’) constitutes a rather odd, 


Reviews 





viewer though useful, tailpiece. 

It is claimed that Analysis and Design of Experiments is ‘invaluable to three groups: mathematicians, 
~tves students and teachers, practical experimenters and statisticians.” One cannot help feeling that this 
s been 


claim becomes less reasonable on passing through these groups and is definitely over-optimistic in 
of four regard to the later classes. It should be clearly understood that this is a mathematical text-book 


maainiend written in an advanced and concise style. This being accepted, however, it can be said that the author 
{ has produced a very good book of this kind, which would be even better without the occasional brief 
» most and rather lonely references to ‘practical applications’. 
Sapp Five pages suffice to cover the fundamental results on the x* and F distributions; a further ten pages 
esting cover ‘the multivariate Normal distributions’ and the distributions of quadratic forms. Chapters 3-6 
= Gare deal with general analysis of variance theory, including the approach via the likelihood ratio test applied 
ilty in to the general linear hypothesis. The next three chapters cover orthogonal Latin squares (treated at 
there relatively great length) and the construction and analysis of incomplete block designs. There follow 
chapters on non-orthogonal data, factorial experiments, randomized blocks and quasi-factorial designs, 
we on analysis of covariance (two pages!) and interblock estimates and interblock variance. In addition to 
mples the usual F-tables, the author includes Tang’s tables (Statistical Research Memoirs, 2, 126), from which 
ertain the power function of the analysis of variance tests can be evaluated in certain cases. 
e had Despite its austere style and the inequalities of treatment of the various parts of the subject, this book 
ee is a useful summary of the modern mathematical theory which has grown up around the analysis of 
= b> aro N. L. JOHNSON 
10-88, 
ses 99 First Course in Probability and Statistics. By J. Neyman. New York: Henry Holt 
os Aon and Co. Inc. 1950. Price 30s. 
sprint | As indicated in the Preface, this book follows the lines of a course of lectures given mainly to beginners 
[ORAN in mathematical statistics at the University of California. Thus it is assumed that the reader’s mathe- 
matical knowledge is limited to ‘high-school algebra’, and to avoid the use of calculus the problems i 
tackled are those in which the variable concerned has a discontinuous distribution. What the author has 
4 termed the fundamental probability set remains, therefore, finite, and theorems in probability can be 
PP. derived rigorously without making an appeal to the theory of sets. But while the mathematics can in this 
> 46s. sense be kept simple, the author aims at introducing from the beginning what he regards as the basic 
concepts of statistical theory and connecting these concepts with various fields of application. Certain 
dover starred sections carry the subject rather further for graduates who, while beginners at statistics, will in 


general be of substantially higher intellectual maturity. 
The special class of students in view has no doubt dictated to some extent the line of approach to the 
nge of subject, but as in the case of any course of lectures given by a teacher of outstanding individuality, the 


} they structure of the book reflects strongly his own outlook on the subject. For this reason it is particularly 
> wreak interesting to have set out in simple form what may be termed Prof. Neyrnan’s philosophy of statistical 
chran thought. The characteristic feature of his contribution to statistics during the past 25 years has lain in 
wane his conviction that the more ernpirical British and American approach to mathematical statistics needed 
n this strengthening by a closer and more sure integration with the theory of probability. This need arises not 
only in the development of the mathematical theory of distributions, but more importantly in defining 
of all the relation between our calculus and the way we think. 
ters } In a paper of 1937 Neyman suggested that we might properly describe the adjustment of our action 
ser to to a limited amount of observation as ‘inductive behaviour’. In ordinary affairs of life the conscious part 
ction of this adjustment is based on certain rules—when I see this happening, then I do that. Inevitably we 
edure judge the value of the rules by the consequences which result from following them. This general concept 
ot to may be translated into the mathematical field of probability and statistics. When the observations with 
manred which we are faced are of a random or probabilistic nature, the study of these rules of inductive behaviour 
-, m becomes the concern of the mathematical theory of statistics. This uses the theory of probability to derive 
ot be , the performance characteristics of the rules and,:in particular, to study how to determine rules which 
nang have certain desirable or optimum properties. Thus mathematical statistics is a branch of the theory 
4 of probability. The basic ideas, now developed to a more formal stage, are those which emerged from the 
tains association of the author and the reviewer during the years 1927-33. 
<0 | The plan of the book is as follows. Chapter 1 is of an introductory character, indicating where the 
ion 


systematic instruction in probability theory of the next three chapters is leading. Chapter 2 deals with the 











262 Reviews 


fundamental concepts and theorems of probability; a long section on ‘competing risks’ illustrates how 
these concepts may be used to construct a mathematical model which could be employed in studying the 
effectiveness of a specified treatment of a recurrent disease. Chapter 3 shows how probability models 
underlie the theory of genetics. That the models here used may not be adequate, through inevitable sim- 
plification, to describe observed phenomena is perhaps not material to the author’s theme. For, he 
writes, ‘since the main purpose of discussing genetics is to provide interesting illustrations of the applica- 
tion of probability, it is natural to introduce limitations’. In Chapter 4 the concept of a random variable 
is introduced and the binomial and hypergeometric distributions are derived and discussed. Since the 
book confines attention to finite sets, the normal distribution is only considered as providing a means of 
approximation to the sum of binomial terms. 

Finally, with the student thoroughly grounded in probability theory, the volume is completed by a 
90-page section (Chapter 5) on ‘Elements of the theory of testing statistical hypotheses’. This provides 
the main purpose of the book. Within the limits imposed, a simple and precise exposition is given. The 
argument is developed stage by stage, through a series of definitions whose meanings are illustrated by 
examples and driven home by problems and exercises left for the student to solve. Considerable space is 
given to a discussion of R. A. Fisher’s problem of the ‘lady tea-taster’, and considerations involved in the 
choice of a critical region are clearly illustrated on a bivariate problem, such as might arise in using two 
tests in a screening procedure for tuberculosis. The concepts of a best critical region and of a uniformly 
most powerful test emerge simply and naturally; the chapter ends with a brief discussion of the likelihood 
ratio or A-principle as a means of determining an appropriate rest. 

The style of presentation is frankly didactic, with ideas driven home by repetition in the exercises. But 
this must be expected, since one object of the book is to get across to first-year College boys ideas that 
have more commonly been reserved for a later stage of instruction. 

In his introductory chapter the author has suggested that a common pattern can be traced in the 
development of different branches of mathematics. There is first a period in which certain ‘permanencies’ 
are observed among natural phenomena and these create a number of problems in inductive behaviour. 
Next, the mathematician steps in and attempts to form abstract models of the phenomena, elevating the 
rather vaguely appreciated permanencies to the level of basic concepts and axioms, testing the adequacy 
of the models and giving greater precision to the rules of behaviour. Finally, a stage is reached where 
most of the mathematical problems considered are dictated by pure curiosity as to the consequences of 
the axioms assumed and their mutual relations, one with another; the mathematical discipline becomes 
then a branch of pure mathematics. 

Neyman, himself, may be described as standing between the second and third stage of this picture. 
He is intensely fascinated by the construction of his models, but through his many contacts with research 
establishments endeavours to keep these models in gear both with what the experimentalist observes 
and with the processes of thought which guide his behaviour. Nevertheless, it seems likely that a con- 
siderable proportion of the students whose first course in statistics is based on this book as text, will end 
by regarding statistics as a branch of pure mathematics, and in their own later research they may pay 
only lip service to the practical problems which the theory might be directed to solve. 

No doubt this dividing of ways in inevitable, but if so, it seems to follow that even at an initial stage, 
courses in applied and in pure mathematical statistics must be treated separately. The basic concepts 
which Neyman has at heart can be introduced in several ways at an elementary level, but the drill required 
in training the applied statistician must surely involve more handling of numerical data, and less practice 


in the formal presentation of simplified problems in a way to meet the requirements of a rigorous mathe- 


matical discipline. E. S. PEARSON 


Sampling Methods for Censuses and Surveys. By F. Yates. Pp. xiv+318. London: 
Charles Griffin and Co. Price 24s. 


This is an excellent book. 

It is well known that for many purposes the selection of a sample from a population can give under 
favourable circumstances almost as much information as taking a complete census—in fact, in some 
cases the sampling method has been found more accurate, when there has been difficulty in obtaining 
a complete and reliable census return. This book gives a detailed discussion of the practical problems 
involved in planning and executing such a sample survey, the methods of extracting information from 
the data once it has been collected, and the determination of the accuracy of the final results. All the 
necessary formulae used in computations are set out and carefully explained: but the author has not 
given the theoretical derivation of these formulae (no doubt feeling that this book is not really a suitable 
place for the inclusion of a considerable amount of mathematical statistics). 





s how 
ig the 
odels 
> Sim- 
or, he 
plica- 
riable 
e the 
uns of 


by a 
vides 
. The 
ed by 
ace is 
in the 
g two 
ormly 
ihood 


. But 
3 that 


n the 
ncies’ 
viour. 
ng the 
juacy 
where 
ces of 
comes 


cture. 
search 
serves 
A con- 
ll end 


y pay 


stage, 
cepts 
juired 
actice 
1athe- 


ARSON 


idon: 


under 
some 
aining 
blems 
. from 
\ll the 
as not 
itable 





~ ee 


—— 





Reviews 263 


Chapter 1 discusses the relative advantages of sampling and complete census methods. Chapter 2 
shows how to avoid biased and unreliable samples (with illustrations pointing out the errors which can 
occur as the result of bad selection). Chapter 3 describes the various types of sample—the completely 
random sample, useful for a fairly homogeneous population, and the various kinds of stratified sanaples 
which can be used to increase accuracy when the population to be studied contains units of very 
different sizes. Chapters 4 and 5 deal with the planning, execution, and analysis of the survey, including 
such questions as the exact determination of the units to be sampled, the instructions to be given to 
the field investigators, and the recording of the results on punched cards. Chapter 6 deals with the 
estimation from the sample of the means and totals of the population values, and chapter 7 the estima- 
tion of the error in these estimates, using various methods of sampling. In the last chapter the question 
of ‘efficiency’ of a sample survey is discussed. By performing a small ‘pilot’ survey, or by using 
experience gained in previous surveys, an estimate can be obtained of the efficiency of different methods 
of sampling the population. This enables the survey to be designed in such a way as to cost as little as 
possible for a given accuracy. Sometimes when the information obtained in a survey can be considered 
as of direct financial value, it may be possible to balance the cost of obtaining extra information against 
the benefits obtained from greater accuracy, and so obtain the most favourable result. 

The style throughout the book is very lucid, and no mathematical knowledge is assumed beyond 
simple algebra and the use of summation signs. Dr Yates makes an ingenious use of different styles 
of type to distinguish population values and estimates, which makes the formulae easy to read and 
comprehend (though less easy to distinguish when spoken). Wherever it is suitable, the methods are 
illustrated by worked examples. The index has been carefully prepared. There are a number of helpful 
practical hints scattered throughout the book: and, to sum up, the book is one which can be cordially 


recommended to all concerned in the planning and execution of surveys. CEDRIC A. B. SMITH 


Symposium on Stochastic Processes. ‘Stochastic processes and statistical physics.’ 
By J. E. Moya. ‘Some evolutionary stochastic processes.’ By M. S. BartLett. 
‘Stochastic processes and population growth.’ By D.G. Kenpauu. Published in the 
Journal of the Royal Statistical Society, Series B, Vol. x1, 1949, pp. 150-282. Cost of 
reprint 12s. 6d. post free. 


This symposium marks a break from the hitherto intensive pre-occupation with stationary series, with 
the accompanying concentration on correlogram and periodogram analysis. It is evident after reading 
the three papers presented, if not before, that there are unending fields of application for more general 
models. It is trv. that most of the situations considered are of a fairly precise physical nature, where 
one has strong a priori reasons for expecting a particular mathematical model to prevail. That is, the 
reasoning is direct rather than inverse. Even so, the problems arising are deep ones, and the methods 
here developed go a long way towaras their solution. 

Mr Moyal treats several problems of mathematical physics, notably of quantum mechanics, where 
a stochastic element is present. It may be present only in the initial conditions; given these, the 
system develops deterministically. This is a ‘crypto-deterministic’ system. On the other hand, new 
random elements may be continually entering the system as in the Brownian motion, and this is the 
case of chief interest. 

The author’s presentation of his ‘theory of random functions’ is a fine piece of work. The material 
is necessarily rather condensed, but it shows almost throughout a unity and continuity of treatment 
which lighten the reading considerably. It is instructive to see the strict parallel between a calculus 
of random functions and a calculus of ordinary determinate functions. Modes of convergence, pro- 
perties such as differentiability and analycity all have their direct analogues. 

On the other hand, the abrupt hop from an analytic treatment to the use of abstract spaces would 
be rather disconcerting to one who had not previously read Kolmogoroff’s and Karhunen’s own works. 

Moyal defines a random function initially by giving the simultaneous distribution function of the 
dependent variable, ‘x’, for all arbitrary finite sets of values of the independent variable ‘t’. As he 
himself points out, such a definition is often insufficient if we wish to consider the behaviour of ‘x’ over 
continuous ‘t’ intervals, and in this connexion some interesting unpublished work due to Prof. Pitt is 
outlined. To the applied worker, this is, of course, a finesse. 

The author gives a definition of the ‘degree of indeterminacy’ of a random function in terms of the 
number of functional relationships required to determine the whole course of the process. This is clearly 
akin to H. Wold’s ‘rank of singularity’ (1938), although Moyal does not limit himself to the case 
where the functional relationships are linear. 











264 Reviews 


Moyal strongly emphasizes the importance of the auto-covariance function, whose analytical 
properties largely determine those of the process. This is clearly seen in the sections on stochastic 
convergence, differentiability, and the existence of the Riemann and Lebesgue integrals of the process. 

Among examples of the direct establishment of stochastic equations, the author gives the Brownian 
motion and the shot effect. The appropriate relations here are often written as usual differential 
equations, although the derivative of highest order does not exist. To avoid this Moyal uses two 
distinct differential operators w.r.t. time: 6/d¢ and d/dt. This seems as convenient a method as any. 

The very general treatment of the behaviour of an arbitrary stable linear system into which a stochastic 
impulse function is being injected is interesting, particularly in view of the prediction theory for such 
systems recently developed by Dolph & Woodbury (unpublished). 

The section on the deduction of equations giving the time behaviour of the distribution or characteristic 
function is of great interest. This is largely related to M. 8. Bartlett’s work. Equation 8-1-8 is of great 
generality, as it is obtained by substituting the limiting form of the ‘transition’ probability in the 
general consistency relation, and merely assumes that this limiting transition probability exists. It is 
tempting to surmise what this equation has for interpretation. If only the first derivate moment is 
non-zero the equation is the continuity equation of a fictitious ‘probability fluid’. The more general 
equation has also something of this character: if the variate increment in an infinitesimal time increment 
has a probability of a fixed order of magnitude, then the differential coefficients of the probability are 
to some extent restricted, so that spontaneous discontinuities do not occur. 

Moyal gives ample illustration of the advantages of the method in applications. For the Langevin 
equation with one degree of freedom, the very generality of the method makes a direct solution preferable, 
but its virtues are apparent in the case of several degrees of freedom. 

Quite apart from the results of his own researches there presented, Moyal’s paper is most 
impressive in the way it so firmly and naturally unites several branches of science. For the reviewer, 
at least, the reading of the article cast a new light upon the subject of stochastic processes and its scope. 

Prof. Bartlett chose to deal with some evolutionary processes, and, in the spirit of the symposium, 
gave an advanced treatment of several problems of genuine and urgent practical interest. His paper 
fell into two sections, each of which centred about a particular mathematical relation. 

For the first section, on additive processes, his relation was Wald’s identity for the number of random 
variates whose sum is to fall between fixed limits. Letting one limit degenerate to minus infinity, the 
author obtains an upper bound for the probability that the other limit is some time overshot. As the 
author briefly indicates, the result is also valid for the continuous case. A. Khintchine’s well-known 
monograph (1933) contains much that is relevant here, as it does also for the diffusion equation below. 

The asymptotic results obtained on the basis of this identity for the insurance and renewal problems 
have, of course, been reached before by direct methods. This does not rob the demonstration of its 
interest, however, and it is plain to see that a whole class of asymptotic results could be thus obtained, 
with very little effort. 

The application of the theory of additive processes to obtain a distribution of Kolmogoroft’s was 
a surprising one. Surprising, because it is by no means immediately seen that there is any connexion 
between this problem of deviations of an empirical distribution curve from theory, and stochastic 
processes. However, Prof. Bartlett’s rather condensed reasoning soon shows that there is, and the 
result follows delightfully simply. The problem is reduced to the consideration of the diffusion equation 
with given boundary conditions, and it is probable that there are related problems which could well be 
treated in this manner. 

The sec«x.d section, on multiplicative processes and extensions of these, is built upon the relation 
derived fron: the consistency condition and the limiting form of the transitional probability, to which 
we have already drawn attention in Moyal’s paper. The equation is most used here as a relation 
between probability generating functions. The author concerns himself largely with the multivariate 
case, corresponding to several ‘states’, and it is soon apparent that the equation for a given Markov 
model may be written down almost directly, by following a few simple rules. One can say that the 
methods presented here virtually give an immediate solution to any problem of a Markov nature, if 
we mean by a solution that all probability considerations have been reduced to definite mathematical 
relations; in this case, certain partial differential equations. However, these equations were explicitly 
soluble only in two cases, one of them being the case of a conservative system when the equation was 
linear with linear coefficients. Otherwise, the way was blocked by the equations’ non-solubility. 

The real need here seems to be a general treatment of the type of equation arising. That this will not 
be simple is shown by the equations’ intractability even in the deterministic case. Bartlett notes 
that the first few terms of a power series expansion gave a good approximation to the empirical results 
in one deterministic case, this suggests the feasibility of investigating series solutions of some sort for 











rtical 
astic 
cess. 
ian 
ntial 
two 
Ly. 
astic 
such 


ristic 
reat 
1 the 
It is 
nt is 
neral 
ment 
y are 


yevin 
able, 











265 


the stochastic models also. An alternative is that a relatively large group of the equations and initial 
conditions encountered may ultimately involve only one or two functions not already tabulated. 
Tabulation of these would give the models some practical value. 

It is quite obvious that the construction of an equation in the characteristic function gives a method 
of great generality and power. Bartlett’s paper solves many problems and reveals yet more which 
remain to be solved. ; 

Mr Kendall’s paper gives an almost complete exposition of population growth mathematics, from the 
very simplest deterministic model up to his own interesting treatment of more realistic explanations. 
Here, as in Bartlett’s paper, it is evident that we are now in a position to set up equations for models 
of almost any degree of refinement. The difficulty is to solve them. Many of these equations, Kendall’s 
(111) for example, are of new types and would repay investigation for their own sake. 

The author uses a variety of techniques: direct attacks and equations in the characteristic or prob- 
ability generating functions. His use of functionals of an arbitrary function to treat a continuous age 
distribution is something which opens up new fields. He himself points out that the solutions have the 
drawback of a discontinuity for an age equal to current time. Nevertheless, this seems a very natural 
method of attacking the problem, and the difficulty could perhaps be overcome. After all, the discontinuity 
is not a meaningless one. 

It seems probable that Kendall’s ‘multiple phase’ process, which he develops in connexion with 
bacterial colonies, could be adapted to the case of an animal population with a birth-rate dependent 
upon age. This could be achieved by permitting reproduction in several phases with a probability 
intensity appropriate to the phase. Such a treatment would, of course, introduce the complication of 
two distinct ages, a temporal age and a ‘virtual’ age, determined by the phase reached. An effect which 
is, perhaps, realistic enough. 

Kendall emphasizes strongly an effect observed by Feller, that the mean behaviour of a stochastic 
model does not necessarily coincide with the behaviour of the corresponding deterministic model. His 
treatment of the logistic case illustrates convincingly this rather unexpected fact. 

The result for the estimate of the birth-rate in a purely reproductive process is a neat one. We may 
expect that the estimation problems will be formidable for more complicated models, but, as Kendall 
observes, the difficulty does not arise until the models themselves have been found worthy. The estimate 
variance is here based upon the supposition of a large number of replicates. This supposition may be 
fulfilled in the case of bacterial or insect populations, but for human populations, at least, we will rather 
possess @ relatively long set of observations upon the same material. The estimate variance must thus 
be derived rather differently. We can, of course, split the population into sub-populations, but whether 
these develop independently is another matter. 

The discussion following the presentation of the three papers amply attested to their value and to the 
stimulation which is felt even with a reading of them. The best possible commentary upon such 
a symposium would be the development under the next year or two of material for a third symposium, 
and there is little doubt but that this will be realized. P. WHITTLE 


Reviews 





University Institute of Statistics, 
Uppsala 
REFERENCES 


Dotpn, C. L. & Woopsury, M. A., University of Michigan. Unpublished memorandum. 

Kurntcuine, A. (1933). Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. Ergebn. Math. 
4, no. 3. 

Wo tp, H. (1938). A Study in the Analysis of Stationary Time Series. Dissertation, Stockholm. 


CORRIGENDA 
E. S. Pearson, Biometrika (1950), 37 


p. 386. A 2 has been omitted outside the brackets in equation (12), which should read: 


=| zdu= 2—2{F(X) log, F(X) — F(X —1) log, F(X — 1)}/f(X). 
0 ; 


(Al Rights reeroed) 
BIOMETRIKA. Vol. 38, Parts’ 1 ae 2 


CONTENTS 


Major Greenwood (With Frontispiece) . ‘ ; . R ‘ é s ; 


Tables of the 5% and 0-5% points of Pearson curves (vith argument Pan fi) expreaed in wandard 
measure. By E. 8. Pearson and Maxine MERRINGTON. j ‘ : 


Regression, structure and functional relationship. Part I. By M. G. eibshas. ‘ : . 
Partial and multiple rank correlation. By P. A. P. Moran . Fi ; ° ‘ : ; ° 
An application of the distribution of the ranking concordance coefficient. By A. Stuart . . . 


The effect of non-normality on the power function of the F’-test in the analysis of variance. me F.N. 
Davin and N. L. JoHNson . x : ‘ ‘ . ; . ; é 


Efficiency of the method of moments and the ae Charlier Type A diateibasion. By L. R. Slsaectess 


On distributions for which the Hartley-Khamis solutior. of the ete is exact. By H. P. 


Estimation problems when a simple type of heterogeneity is present in the sample. By W. M. Lona 
Some tests for randomness in plant populations. By Marjsorrm Tuomas . ‘ . : “a 


Charts of the power function for analysis of variance tests, derived from the non- “ulledl F-distribution. 
By E. S. Pearson and H. O. HartLEy . . : 3 ‘ 


Some oo of distribution in the oe of rank per a By 8. T. Dave, M. @. KENDALL and 


Note on an exact treatment sce ‘edineiier EEE of it and hic problems of significa. By 
G. H. Freeman and J.H. Hatton . : a 4 ‘ = 7 


The geornetry of estimation. By J. Duran and M. G. iecibeink F 4 : ’ . ’ 
Testing for serial correlation in least squares regression. Il. ‘By J. Dursin and ‘@. 8. Watson . ° 
Bi-variate k-statistics and cumulants of their joint sampling distribution. By M. B. Coox 
Random dispersal in theoretical populations. By J. G. SkELLAM 5 - 5 


The frequency distribution of the product-moment correlation coefficient in random samples of any size 


drawn from non-normal universes. By A. K. GAYEN . 
MISCELLANEA 


Some observations on the practical aspects of weighing designs. By K. S. BANERJEE. > 
Test for the significance of the difference between means in two normal ie pate roe umequal 
variances. By D. G. C. Gronow . : 4 
REVIEWS 


R. A. FisHer’s ‘Contributions to Mathematical Statistics’. 8. - . . « « . 287) 
W. Fertier’s ‘Probability Theory and ite Applications’, vol. I. 3 i 4 . ; - 259° 
Wit11am G. Cocoran and GERTRUDE M. Cox’s ‘Experimental Designs’ . . . . - 260; 
H. B. Mann’s ‘Analysis and Design of Experiments’ . é ; : x 260 

J. NeyMan’s ‘First Course in Probability and Statistics’ . ; : ‘. ; ¥ ° 261 

F. Yares’s ‘Sampling Methods for Censuses and peer Se age hae ee Sa sae a 
‘Symposium on Stochastic Processes’ . . : ‘ x ; : . ‘ 263 © 





A volume of Biometrika containing about 400 pages, with plates and tables, will be published annually in two half- 
yearly issues. 


Papers for publication should either be sent to 
PROFESSOR E. 8. PEARSON, Department of Statistics, University College, London, W.C. 1, 
or if more convenient to 
Dr Joun Wisnakt, Statistical Laboratory, St Andrew’s Hill, Cambridge. 
Prorrssor M. G. Kenpauzt, London School of Economics, Houghton St., London, W.C. 2 e. 
It is a condition of publication in Biometrika that the paper shall not already have been issued elsewhere, and will not be @ 
reprinted without leave of the Editors. 
Contributors receive 25 copies of their papers free. Joint authors 15 copies each. 
The subscription price, payable in advance, is Inland 45s. net per volume and Abroad 54s. net (including packing and @ 
postage) At present certain early volumes are out of print, but where available the cost is Vols. 1 to 32, £5 ype and Vols. 
to 36, 63s. or (excluding postage). Index to Vols. 1 to 5, 2s. net. Index to Vols. 1 to 15, 58. net. Bound volumes, ¢ 
those volumes in print, are again available at a charge of 10s: extra per volume; also binding cases at 4s. 6d. each. Cheques 
must be made payable to Biometrika, crossed “a/c Biometrika Trust” and sent to The Secretary, Biometrika Offi 


ce, Departm 
of Statistics, University College, London, W.C. 1, to whom all orders for series, single copies and offprints should be addressed. : 
All foreign cheques must be drawn in sterling and on a Bank having a London Agency. 


First Printed in Great Britain at the University Press, Oambridge 
Reprinted by offset-litho by Percy Lund Humphries & Co., Lid., Bradford 








